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Abstract The Web of Linked Data is composed of tons of RDF documents in¬ 
terlinked to each other forming a huge repository of distributed semantic data. 
Effectively querying this distributed data source is an important open problem 
in the Semantic Web area. In this paper, we propose LDQL, a declarative lan¬ 
guage to query Linked Data on the Web. One of the novelties of LDQL is that 
it expresses separately (i) patterns that describe the expected query result, and 
(ii) Web navigation paths that select the data sources to be used for computing 
the result. We present a formal syntax and semantics, prove equivalence rules, 
and study the expressiveness of the language. In particular, we show that LDQL 
is strictly more expressive than the query formalisms that have been proposed 
previously for Linked Data on the Web. The high expressiveness allows LDQL 
to define queries for which a complete execution is not computationally feasible 
over the Web. We formally study this issue and provide a syntactic sufficient con¬ 
dition to avoid this problem; queries satisfying this condition are ensured to have 
a procedure to be effectively evaluated over the Web of Linked Data. 

1 Introduction 

In recent years an increasing amount of structured data has been published and inter¬ 
linked on the World Wide Web (WWW) in adherence to the Linked Data principles [3]. 
These principles are based on standard Web technologies. In particular, (i) the Hypertext 
Transfer Protocol (HTTP) is used to access data, (ii) HTTP-based Uniform Resource 
Identifiers (URIs) are used as identifiers for entities described in the data, and (iii) the 
Resource Description Framework (RDF) is used as data model. Then, any HTTP URI 
in an RDF triple presents a data link that enables software clients to retrieve more data 
by looking up the URI with an HTTP request. The adoption of these principles has lead 
to the creation of a globally distributed dataspace; the Web of Linked Data. 

The emergence of the Web of Linked Data makes possible an online execution of 
declarative queries over up-to-date data from a virtually unbounded set of data sources, 
each of which is readily accessible without any need for implementing source-specific 
APIs or wrappers. This possibility has spawned research interest in approaches to query 
Linked Data on the WWW as if it was a single (distributed) database. For an overview 
on query execution techniques proposed in this context refer to [12]. 

The main contribution of this paper is the proposal of LDQL, a novel query lan¬ 
guage for the Web of Linked Data. The most important feature of LDQL is that it 
clearly separates query components for selecting query-relevant regions of the Web of 
Linked Data, from components for specifying the query result that has to be constructed 
from the data in the selected regions. The most basic construction in LDQL are tuples 
of the form (L, Q) where L is an expression used to select a set of relevant documents, 
and Q is a query intended to be executed over the data in these documents as if they 

* This document is an extended version of a paper published in ISWC 2015 [13]. 



were a single RDF repository. In an abstract setting one can use several formalisms to 
express L and Q. In our proposal, for the former part we introduce the notion of link 
path expressions that are a form of nested regular expressions (with some other impor¬ 
tant features) used to navigate the link graph of the Web. For the latter, we use standard 
SPARQL graph patterns. To begin evaluating these queries one needs to specify a set of 
seed URIs. The language also possesses features to dynamically (at query time) identify 
new seed URIs to evaluate portions of a query. Additionally, such queries can be com¬ 
bined by using conjunctions, disjunctions, and projection. We present a formal syntax 
and semantics for LDQL, propose some rewrite rules, and study its expressive power. 

While there does not exist a standard language for expressing queries over Linked 
Data on the WWW, a few options have been proposed. In particular, a first strand of re¬ 
search focuses on extending the scope of SPARQL such that an evaluation of SPARQL 
queries over Linked Data has a well-defined semantics [9,11,14,18]. A second strand 
of research focuses on navigational languages [7,14]. Although these languages have 
different motivations, a commonality of all these proposals is that, in contrast to LDQL, 
the definition of query-relevant regions of the Web of Linked Data and the definition of 
query-relevant data within the specified regions are mixed. 

As our second main contribution we compare LDQL with three previously pro¬ 
posed formalisms for querying the Web of Linked Data: SPARQL under reachability- 
based query semantics [11], NautiLOD [7], and SPARQL Property Path patterns under 
context-based semantics [14]. We formally prove that LDQL is strictly more expressive 
than every one of these. We show that for every query Q in the previous languages, one 
can effectively construct an LDQL query which is equivalent to Q. Moreover, for every 
one of the previous languages, there exists an LDQL query that cannot be expressed in 
that language. These results show that LDQL presents an interesting expressive power. 

The downside of the expressiveness provided by LDQL is the existence of queries 
for which a complete execution is not feasible in practice. To capture this issue formally, 
we define a notion of Web-safeness for LDQL queries. Then, the obvious question that 
arises is how to identify LDQL queries that are Web-safe. Our last technical contribution 
is the identification of a sufficient syntactic condition for Web-safeness. 

The rest of the paper is structured as follows. Section 2 introduces a data model 
that provides the basis for defining the semantics of LDQL. In Section 3 we formally 
define the syntax and semantics of LDQL and show some simple algebraic properties. 
In Section 4 we compare LDQL with the three mentioned languages, and in Section 5 
we focus on Web-safeness. Section 6 concludes the paper and sketches future work. 
Proofs of the formal results in this paper can be found in the Appendix. 

A preliminary version of some of the results in this paper have been presented in a 
workshop [10]. This paper is a substantial extension of [10] refining the definition of 
LDQL and introducing important changes to the syntax and the semantics of the lan¬ 
guage. Moreover, the comparison with previous proposals was not discussed in [10]. 

2 Data Model 

In this section we introduce a structural data model that captures the concept of a Web 
of Linked Data formally. As usual [7,9,11,14,18], for the definitions and analysis in this 
paper, we assume that the Web is fixed during the execution of any single query. 


We use the RDF data model [5] as a basis for our model of a Web of Linked Data. 
That is, we assume three pairwise disjoint, infinite sets U (URIs), B (blank nodes), and 
£ (literals). An RDF triple is a tuple (s,p, o) € T with T = {UUB) xU x {UUBUC). 
For any RDF triple t = {s,p,o) we write uris(f) to denote the set of all URIs in t. 

Additionally, we assume another infinite set D that is disjoint from lA, B, and £, 
respectively. We refer to elements in this set as documents and use them to represent the 
concept of Web documents from which Linked Data can be extracted. Hence, we as¬ 
sume a function, say data, that maps each document d € D to a finite set of RDF triples 
data((i) C 7” such that the data of each document uses a unique set of blank nodes. 

Given these preliminaries, we are ready to define a Web of Linked Data. 

Definition 1. A Web of Linked Data is a tuple W = {D, adoc) that consists of a set 
of documents D CD and a partial function adoc: U —>■ D that is surjective. 

Function adoc of a Web of Linked Data W = {D, adoc) captures the relationship 
between the URIs that can be looked up in this Web and the documents that can be 
retrieved by such lookups. Since not every URI can be looked up, the function is par¬ 
tial. For any URI u € lA with u € dom(adoc) (i.e., any URI that can be looked up 
in W), document d = adoc{u) can be considered the authoritative source of data for u 
in W (hence, the name adoc). To accommodate for documents that are authoritative 
for multiple URIs, we do not require injectivity for function adoc. However, we require 
surjectivity because we conceive documents as irrelevant for a Web of Linked Data if 
they cannot be retrieved by any URI lookup in this Web. 

Let W = {D, adoc) be a Web of Linked Data. W is said to be finite [I I] if its set D 
of documents is finite. In this paper we assume that every Web of Linked Data is finite. 
Given documents d,d' € D and a triple t G data(d), we say that a URI u G uris(f) 
establishes a data link from d to d', if adociu) = d'. As a final concept, we formalize 
the notion of a link graph associated to W. This graph has documents in D as nodes, 
and directed edges representing data links between documents. Each edge is associated 
with a label that identifies both the particular RDF triple and the URI in this triple that 
establishes the corresponding data link. These labels shall provide the basis for defining 
the navigational component of our query language. 

Definition 2. The link graph of a Web of Linked Data W = (D, adoc), is a directed, 
edge-labeled multigraph, Qw = {D, Ew), with set of edges Ew L D x fTxlA) x D de¬ 
fined as E\y = { {dsrc, {t, u), dtgt) \ t G datajdsrc), m G uris(f) and dtgt = adoc{u)}. 

For a link graph edge e = (dsrc, {t, u), dtgt), tuple (f, u) is the label of e. Moreover, 
we sometimes write e G Sw to denote that e is an edge in the link graph Qw- 

Example 1. As a running example for this paper assume a simple Web of Linked Data 
= (^ex, adoCex) with three documents, di\, de, and dc (i.e., Dgx = {dA, de, dc}). 
The data in these documents are the following sets of RDF triples: 


data(dA) = {(itA,pi, ub), 
{ub,P2,uc)}\ 


data(dB) = {(ub,pi, uc)}; 
data(dc) = {{u^,p2,uc)}■, 



("A. Pi. “b). "a 


{ug, P2, Ug), Ub 



Figure 1. The link graph of our example Web of Linked Data Wex. 

and for function adoc^x we have: adoCgx{ui\) = d/\, adocexius) = dB, adoCex{uc) = dc, 
and adocexipi) = (i-e., dom(adocex) = {u/\,ub,uc,Pi})- This Web contains 10 

data links. For instance, URI ua in the RDF triple (ua,P 2 , uc) € data((ic) establishes 
a data link to document d/\. Hence, the corresponding edge in the link graph of Wex is 
(dc, ((ma,P 2 , Me), ua), dA)- Figure 1 illustrates the link graph with all 10 edges. 

3 Definition of LDQL 

This section defines our Linked Data query language, LDQL. LDQL queries are meant 
to be evaluated over a Web of Linked Data and each such query is built from two types 
of components: Link path expressions (LPEs) for selecting query-relevant documents of 
the queried Web of Linked Data; and SPARQL graph patterns for specifying the query 
result that has to be constructed from the data in the selected documents. For this paper, 
we assume that the reader is familiar with the definition of SPARQL [8], including the 
algebraic formalization introduced in [16,2]. In particular, for SPARQL graph patterns 
we closely follow the formalization in [2] considering operators and, opt, union, filter, 
and GRAPH, plus the operator bind defined in [8]. We begin this section by introducing 
the most basic concept of our language, the notion of link patterns. We use link patterns 
as the basis for navigating the link graph of a Web of Linked Data. 

3.1 Link Patterns 

A link pattern is atuple in (WU{_,+}) x (^TU{_,+}) x (idU£U{_, +}). Link pat¬ 
terns are used to match link graph edges in the context of a designated context URI. The 
special symbol -f denotes a placeholder for the context URI. The special symbol _ de¬ 
notes a wildcard that will drive the direction of the navigation. Before formalizing how 
link graph edges actually match link patterns, we show some intuition. Consider the 
link graph of Web Wex in Example 1 (see Fig. 1), and the link pattern (-|-,pi, _). Intu¬ 
itively, in the context of URI ma, the edge with label ((itA,pi, mb), mb) from document 
c?A to document c?b, matches the link pattern (-l-,pi, _). Notice that in the matching, 
the context URI ma takes the place of symbol -b, and mb takes the place of the wildcard 
symbol _. Notice that mb also denotes the direction of the edge that matches the link 
pattern. On the other hand, the edge with label ((ma,Pi, mb), ma) from d/\ to dA, does 
not match {+,pi, _); although mb can take the place of the wildcard symbol _, the 
direction of the edge is not to mb. That is, when matching an edge labeled by (t, u) we 
require URI u to be taking the place of a wildcard in the link pattern. When more than 
one wildcard symbol is used, the link pattern can be matched by edges pointing to the 








direction of any of the URIs taking the place of a wildcard. For instance, in the context 
of Ufi,, the link pattern {_,P 2 , _) is matched by edges (^a, {{ub,P 2 , uc), ub), de) and 
(cZaj i{uB:P 2 , uc),uc), dc)- The next definition formalizes this notion of matching. 

Definitions. A link graph edge with label ((xi,X 2 ,X 3 ),u) matches a link pattern 
( 2/17 2/2 j 2 / 3 ) in ths context of a URl Uctx if the following two properties hold; 

1. there exists i G {1, 2, 3} such that yi = _ and Xi = u, and 

2. for every i G {1, 2,3} either t/i = + and Xi = Uctx, or yi = Xi, or = _. 

One of the rationales for adopting the notion of a context URl and the + symbol 
in our definition of link patterns, is to support cases in which link graph navigation 
has to be focused solely on data links that are authoritative. A data link represented 
by link graph edge (dsrc, (f, u), dtgt) G Gw is authoritative in a Web of Linked Data 
W = {D, adoc) if dsrc = adoc{u') for some URl u' G uris(f). Thus, if we fix a context 
URl Uctx 7 a link pattern that uses the + symbol allows us to follow only authoritative 
data links from document dctx = adoc{uctx)- 


3.2 LDQL Queries 

The most basic construction in LDQL queries are tuples of the from (L, P) where L 
is an expression used to select a set of documents from the Web of Linked Data, and 
P is a SPARQL graph pattern to query these documents as if they were a single RDF 
dataset. In an abstract setting, one can use any formalism to specify L as long as L 
defines sets of RDF documents. In our proposal we use what we call link path expres¬ 
sions (LPEs) that are a form of nested regular expressions [17] over the alphabet of link 
patterns. Every link path expression begins its navigation in a context URl, traverses 
the Web, and returns a set of URIs; these URIs are used to construct an RDE dataset 
with all the documents to be retrieved by looking up the URIs. This dataset is passed 
to the SPARQL graph pattern to obtain the final evaluation of the whole query. Besides 
the basic constructions of the form {L, P), in LDQL one can also use and, union and 
projection, to combine them. We also introduce an operator seed that is used to dynam¬ 
ically change, at query time, the seed URl from which the navigation begins. The next 
definition formalizes the syntax of LDQL queries and LPEs. 

Definition 4. The syntax of LDQL is given by the following production rules in which 
Ip is an arbitrary link pattern, ?u is a variable, P is a SPARQL graph pattern (as per [2]), 
U is a finite set of variables, and 17 is a finite set of URIs: 

q := {lpe,P) \ (SEED U q) \ (SEED Iv q) \ (yANDg) | (y UNION g) | rryg 
Ipe '.= e \ Ip \ Ipe/lpe \ lpe\lpe \ Ipe* \ [Ipe] \ (?u,g) 

Any expression that satisfies the production q is an LDQL query, any expression that 
satisfies the production Ipe is a link path expression (LPE), and any LDQL query of 
the form {lpe,P) is a basic LDQL query. 


Before going into the formal semantics of LDQL and LPEs, we give some more 
intuition about how these expressions are evaluated in a Web of Linked Data W. As 
mentioned before, the most basic expression in LDQL is of the form (Ipe, P). To evalu¬ 
ate this expression over W we will need a set S of seed URIs. When evaluating (Ipe, P), 
every one of the seed URIs in S will trigger a navigation of link graph Qw via the link 
path expression Ipe starting on that seed. That is, the seed URIs are passed to Ipe as 
context URIs in which the LPE should be evaluated. These evaluations of Ipe will result 
in a set of URIs that are used to construct a dataset over which P is finally evaluated. 

Regarding the navigation of link graph Qw, the most basic form of navigation is to 
follow a single link graph edge that matches a link pattern Ip. When a navigation via 
a link pattern Ip is triggered from a context URI u, we proceed as follows. We first 
go to the authoritative document for u, that is adoc{u), and try to find outgoing link 
graph edges that match Ip in the context of u (as explained in Section 3.1). Every one 
of these matches defines a new context URI u' from which the navigation can continue. 
More complex forms of navigation are obtained by combining link patterns via clas¬ 
sical regular expression operators such as concatenation /, disjunction |, and recursive 
concatenation (•)*. The nesting operator [•] is used to test for existence of paths. When 
a context URI u is passed to an expression [Ipe], it checks whether Qw contains a path 
from dctx = adoc{u) that matches Ipe. If such a path exists, the navigation can con¬ 
tinue from the same context URI u. The most involved form of navigation is by using 
the expression (?u, q) with q an LDQL query. To evaluate this expression from context 
URI u one first has to pass rt as a seed URI for q and recursively evaluate q from that 
seed. This evaluation generates a set of solution mappings, and for every one of these 
mappings its value on variable ?u is used as the new context URI from which the navi¬ 
gation continues. Linally, note that our notion of LPEs does not provide an operator for 
navigating paths in their inverse direction. The reason for omitting such an operator is 
that traversing arbitrary data links backwards is impossible on the WWW. 

To formally define the semantics of LDQL we need to introduce some terminology. 
We first define a function datasetvi/(’) that from a set of URIs constructs an RDL 
dataset with all the documents pointed to by those URIs in W. Lormally, given a Web of 
Linked Data W = {D, adoc) and a set U of URIs, datasetw(C^) is an RDL dataset (as 
per [8,2]) that has the set of triples {t € data(adoc(u)) \ u G U Ci dom(adoc)} as 
default graph. Moreover, for every URI u G U D dom(a(ioc), datasetvy (C/) contains 
the named graph {u, data.{adoc{u))). 

Example 2. Consider the Web W^x in Example 1 and the set of URIs U = {ma, wc}. 
Then dataset Wex(C^) has {{ua,Pi,ub), {ub,P2,uc), (ma,P2, mc)} as default graph, and 
two named graphs, (ua, {{ua,pi,ub), {ub,P2,uc)}) and {uc, {{ua,P2,uc)}). 

In the formalization of the semantics of LDQL, we use the standard join operator n 
over sets of solution mappings [8,16]. We also make use of the semantics of SPARQL 
graph patterns over datasets as defined in [2]. In particular, given an RDL dataset J), an 
RDL graph G in 2), and a SPARQL graph pattern P, we denote by |L’]§ the evaluation 
of P over G in 2) [2, Definition 13.3]. 

We are now ready to formally define the semantics of LDQL and LPEs. Given a Web 
of Linked Data W and a set S of URIs, we formalize the evaluation of LDQL queries 


over W from the seed URIs S, as a function 1-]^ that given an LDQL query, produces 
a set of solution mappings. Similarly, the evaluation of LPEs over W from a context 
URI u, is formalized as a function 1-]^ that given an LPE, produces a set of URIs. 


Definition 5. Given a finite set SQU, the S'-based evaluation of LDQL queries over a 
Web of Linked Data W={D, adoc), denoted by 1']^, is defined recursively as follows: 


l{lpe,P)jlv = 

|(SEED U = 

|(SEED 7v q)j^ = 

[(gi UNION g2)lw = 
|(gi AND 52)1^ = 


|Pl® where D = dataset^v default graph G, 

Idlw^ 

UuGW id-u}) where /it, = {Iv ^ u} for all u&U, 

|gil^ u |g2]^, 

|gil^ XI 

{/i I there exists /i' G [gj^ such that /i and /i' are 
compatible and dom(/i) = dom(/i') n U}. 


Now for the semantics of LPEs, given a context URI itctx G dom(adoc), the Uctx-based 
evaluation of LPEs over W, denoted by is defined recursively as follows: 


WJw" 

llpe^/lpe^lw"" 

[Ipe^llpe^l"^^ 

lilpeWw 
I q) Iw" 


{rictxl; 

{u GU \ there exist a link graph edge {dsrc, {t, u), dtgt) € Gw, with 
dsrc = adoc{ucxx), that matches Ip in the context of Uctx}, 

{u e llpe 2 \w I u' € 

Ppciliv* U ^^62]^’', 

{Uctx} u {Ipef^ U {Ipe/lpeY^' U {Ipe/Ipe/Ipef^ U ..., 

{uctx I ^ 0}, 

{u GU\ there exists p G such that /i(?u) = u}. 


Moreover, if Uctx ^ dom{adoc), then [fpejjy’' = 0 for every LPE. 

Examples. Let Ipe^^ be the LPE {_,pi, _)*/[(_,P 2 , _)]■ This LPE selects docu¬ 
ments that can be reached via arbitrarily long paths of data links with predicate pi 
and, additionally, have some outgoing data link with predicate p 2 - For our example 
Web Wex and context URI ua, the LPE selects documents cZa = adoCex{uA) and 
dc = adoCex{uc)- More precisely, we have = {ua,mc}- Note that docu¬ 

ment de can also be reached via a pi-path, but it does not pass the p 2 -related test. 


Example 4. Consider a set of URIs = {wa} and a basic LDQL query {Ipe^,^, Pgx) 
whose LPE is Ipe^^ as introduced in Example 3 and whose SPARQL graph pattern is a 
basic graph pattern that contains two triple patterns, Bex = {(?a;,pi, ??/), {Ix,p 2 , ?z)}. 
Given that we have = {ma,mc} (cf. Example 3), datasetwexd^peexlvvex) 

has the default graph {(ua,Pi, ub), {ub,P 2 , uc), {ua,P 2 , uc)} (cf. Example 2). Then, 
according to the query semantics, the result of query {Ipe^^, Bex) over Wex using seeds 
Sex consists of a single solution mapping, namely /i = {?a; ua, ?// >—>• ub,?^ mc}. 


Example 5. Consider an LDQL query gex = (seed lx (e, {lx,pi, Iw)')) whose sub¬ 
query is a basic LDQL query that has a single triple pattern as its SPARQL graph 
pattern. Additionally, let 9 '^ = {{lx,pi,ly), {lx,p 2 ,lz)}) be the basic LDQL 

query introduced in Example 4 , and let be the conjunction of these two queries; i.e., 
9e>< = ( 9 ex and By Example 4 we know that l^exl^l = {m} with p = {lx ^ ua, 
ly I—>■ ub, Iz !->■ ttc}. Eurthermore, based on the data given in Example 1, it is easy to 
see that [gexl^^^ = {^ 1 ,^ 2 } with pi = [lx u/k,1w mb} and p 2 = {‘^x mb, 
Iw I—> Me}. Eor the 5'ex-based evaluation of over Wex, the result sets |gex]^^ and 
['I'exl^’' have to be joined. Thus, we need to compute {/xi,/X 2 } xi {/xj, which results in 
a single mapping p' = pi U p = {lx i-> ua, uc, '->■ ub, ‘^■z ^ uc}. 

3.3 Algebraic Properties of LDQL Queries 

As a basis for the discussion in the next sections, we show some simple algebraic prop¬ 
erties. We say that LDQL queries q and q' are semantically equivalent, denoted by g = g' 
tf Mw = blw holds for every Web of Linked Data W and every finite set S ^U. 

Leituna 1. The operators and and union are associative and commutative. 

Lemma 2. Let gi, g 2 , ga be LDQL queries, the following semantic equivalences hold: 


(gi AND (g2 UNION ga)) = ((gi and g2) union (gi and ga)) ( 1 ) 

7 ry(gi UNION g 2 ) = (rrygi UNION 7 rvg 2 ) ( 2 ) 

(seed U (gi UNION g2)) = ((seed U gi) union (seed U g2)) ( 3 ) 

(seed Iv (gi UNION g2)) = ((seed Iv gi) union (seed Iv g2)) ( 4 ) 


Lemma 1 allows us to write sequences of either and or union without parentheses. 
Our next result shows the power of the construction {Iv, q). In particular, it shows the 
somehow surprising finding that link patterns Ip, concatenation /, disjunction |, and the 
test [•], are just syntactic sugar as they can be simulated by using e, {Iv, q) and (■)*. 

Proposition 1. For every LDQL query q, there exists an LDQL query q' s.t. q = q' and 
every LPE in q' consists only of the symbol e, the construction {Iv, q), and operator (•)*. 

Proof (Sketch). The proof is based on a recursive translation of link path expressions 
beginning with link patterns. Eor instance, a link pattern of the form {+,p, _) is en¬ 
coded by {Iv, {e, (graph lu {lu,p, Iv)))), and we can similarly encode all types of link 
patterns. To encode / we make use of {Iv, q) and the operator and inside g as follows. 
Consider an LPE r = ri/r 2 . It can be shown that r is equivalent to {Iv, q) where g is; 

( (n, (graph ?a: { })) AND (seed?® {r2, (graph ?U {}))) ). 

Similarly, to encode | we make use of union and to encode [•] we use projection. 

Although not strictly necessary, we decided to keep link patterns and operators /, |, 
and [•] since they represent a natural and intuitive way of expressing navigation paths. 


4 Comparison with Previous Linked Data Query Formalisms 

In this section, we compare LDQL with alternative formalisms to query Linked Data on 
the WWW. There are some general query languages for the WWW (proposed before 
the advent of Linked Data) that are related to our proposal; in particular, WebSQL [15], 
which is similar in spirit to LDQL but different in the features that the languages posses. 
Two main novelties of LDQL compared with WebSQL are the possibility to dynami¬ 
cally select seed URIs at query time, and the traversal of links according to properties 
of the queried documents that can be defined in the same LDQL query. Neither of these 
are expressible in WebSQL. While a complete formal comparison between LDQL and 
WebSQL is certainly very interesting, we leave it for future work and, instead, focus on 
three more recent proposals of query formalisms for the Web of Linked Data [7,11,14]. 
We formally show that LDQL is strictly more expressive than every one of them. 

4.1 Comparison with Property Paths under Context-Based Query Semantics 

Property paths (PPs for short) were introduced in SPARQL 1.1 as a way of adding 
navigational power to the language [8]. PPs are a form of regular expressions that are 
evaluated over a single (local) RDF graph; a PP expression is used to retrieve pairs (a, b) 
of nodes in the graph such that there is a path from a to 6 whose sequence of edge labels 
belongs (as a string) to the regular language defined by the expression. The syntax of 
PP expressions is given by the following grammar^, where p,ui,U2, ..., rtfe are URIs. 

pe := p \ !(mi|m 2| • ■ • |wfc) | pe/pe \ pe\pe \ pe* 

A PP-pattern is defined as a tuple of the form {a,pe, /3) where pe is a PP expression, 
and a and /3 are in U £ U 17. 

In [14] the authors adapted the semantics of PP-patterns so that they can be used 
to query the Web of Linked Data. The proposed query semantics is called context- 
based semantics [14]. To define this semantics, the authors first introduce the notion 
of a context selector for a Web of Linked Data W. This context selector is a function 
C^{-) that given a URI u € dom(adoc) returns the RDF triples in data(adoc(M)) 
that have u in the subject position. Formally, for every URI u € dom{adoc) we have 
C^(u) = {(s,p, o) S data(adoc(M)) | s = u}. To simplify the exposition, the authors 
extended the definition of C^(-) to also handle URIs not in dom{adoc), and literals 
and blank nodes. For any such RDF term a they define C^{a) as the empty set. 

The context-based semantics for PPs over the Web of Linked Data in [14] is a bag 
semantics that follows closely the semantics for PPs defined in the normative semantics 
of SPARQL 1.1 [8]. Hence, both semantics use a procedure, the ArbitraryLengthPath 
procedure [8], to define the semantics of the (•)* operator. It was shown in [1] that for 
sets semantics, the normative semantics of PPs can be defined by using standard tech¬ 
niques for regular expressions. To make the comparison with LDQL, in this paper we 
adapt the context-based semantics for PPs presented in [ 14] by following the techniques 
in [1], and consider only sets of mappings. To this end, we define a function that 

given a PP-pattern, returns its evaluation under context-based semantics over the Web 
of Linked Data W. In the definition, for a solution mapping p and an RDF term a, we 

^ In [14] the reverse path construction 'pe is also considered. We do not consider it here as the 
form of navigation of these reverse paths does not represent a traversal of the link graph. 



use the notation /x[a] with the following meaning; ii[a\ = fi{a) if a S dom(/i), and 
fi[a] = a in the other case. Similarly,/r[{s,p, o)] = (/r[s],/r[p],/r[o]). 

I(a,P,/3)]w‘ = {/i I dom(/i) = {a, n V and ^[(o, p, /3}] € C'^(p[a])} 

1(0, !(ui| • • • litfe), = {a* I dom(/.i) = {q;, /?} n V and exists p s.t. 

tJ-l{a,P,l 3 )] e C'^{p[a]) andp ^ {ui, ...,«*}} 
|(a,pei/pe2>^)lw‘= 7r{„,/3}nv(I(«,pei,?v)]w‘ ^ pe^, l3)jw'') 
I(a,peilpe2,/3)]w‘ = [(a,pei,/3)]w‘ U [(a,pe2, 

\{oL,pe*, P)Yw^ = {p I dom(/.i) = {o, /?} n V, p[a] = p[P] and/. i[q] £ terms{W)}\J 
l{a,pe, d)]™ U |(Q,pe/pe, ^)]^* U |(Q,pe/pe/pe, ^)]^* U • • • 

A PP-based SPARQL query [14] is an expression formed by combining PP-patterns 
using the standard SPARQL operators and, union, opt, filter and so on, following the 
standard semantics for these operators [2]. Our next results show that LDQL is strictly 
more expressive than PP-based SPARQL queries under context-based semantics. 

Theorem 1. There exists an LDQL query that cannot be expressed as a PP-based 
SPARQL query under context-based semantics. 

Proof One can show that LDQL query ( 7 = (seed U _),{?x,7x,'?x))) 

with U = {u} cannot be expressed by PPs under context-based semantics because 
this semantics is “blind” to triples that are not authoritative. For instance, in a Web 
W = {{d,d'},adoc) with data(d) = {{u,p,u')}, data((i') = {{u',p,u), {u,u,u)}, 
adoc{u) = d and adoc{u') = d', the evaluation of q is the solution mapping {?x 1 — >■ u}. 
Notice that the only authoritative triple in d' is {u',p, u) as d' = adoc{u') f adoc{u). 
Hence, one can prove that PP-based SPARQL queries under context-based semantics 
cannot access triple (m, u, u) in d', and thus, will never have {7x 1 —>• u} as solution. 

Theorem 2. Let a, /3 G Id U CUV. Then, for every PP-pattern {a,pe, fi), there exists 
an LDQL query q such that l{a,pe, /3)]^* = every Web of Linked Data W. 

Proof (Sketch). In the proof we provide a translation scheme from PPs to LDQL. One 
major complication is that PPs can retrieve literals and, in general, values that are not in 
domi^adoc), which are difficult to handle by LPEs. For every PP-pattern {7x,pe, 7y) we 
construct an LDQL query Qpe{7x, ly). For example, for ftx^pe^jpe^, 7y), our query is 
Tt{7x,?v}{Qpe.,i7x,7z) AND Qpe^{7z, ly)), and for {lx, !(mi| • • • \uk),ly) the translation 
is (sEED?a; (e, {{lx,lp,ly) filter (?p ui A • • • A?p f Uk)))). To handle pe* we 
need to use the construction {Iv, q) of LPEs, plus (•)*. 

4.2 Comparison with NautiLOD 

NautiLOD is a navigation language to traverse Linked Data on the WWW and to per¬ 
form actions (such as sending emails) during the traversal [7]. We compare LDQL with 
NautiLOD without action rules. The syntax of NautiLOD expressions (without actions) 
is given by the following grammar (where p gU and P is a SPARQL graph pattern). 

ne \= p \ p" \ {_) I ne/ne \ ne\ne \ ne* \ ne[(ASKP)] 


In terms of our data model"', the semantics of NautiLOD expressions over a Web of 
Linked Data W ={D, adoc) from URI u G dom(adoc) is defined recursively as follows. 

Ip]w = W I £ data(adoc(u))} 

Ip'ItV = W I W,P,u) G data(adoc(u))} 

I (-) ItV = W I {^^P^ ^') £ data(adoc(u)) for some p GU} 

\nei/ne 2 \w = W I £ [”^2 Yw some u' G \nei with u' G dom(adoc)} 

|nei|ne2]lV = U [«e2]5V 

lne*Yiv = {w} U {neYiv ^ ^ {ne/ne/neYiv ^ ‘ ' 

|ue[(ASKP)] = {u' I u' G [nellV: u ' G dom(adoc) and lP]data(adoc(«')) ^ 

We next show that for every NautiLOD expression there exists an equivalent LDQL 
query. Notice that the evaluation of a NautiLOD expression is a set of URIs, whereas the 
evaluation of an LDQL query is a set of mappings. Thus, to formally state our result we 
compare NautiLOD with LDQL queries that have a single/ree variable. Let q{lx) be an 
LDQL query with ?x as free variable. We say that q(?x) and a NautiLOD expression ne 
are equivalent if for every Web of Linked Data W = {D, adoc) and URIs u, u' with 
u G dom(a(ioc) it holds that u' G if and only if {lx u'} G \q{lx)\\^^. 

Theorems. For every NautiLOD expression ne, there exists an LDQL query q{lx), 
with lx a free variable, that is equivalent to ne. 

Proof (Sketch). The proof begins with a simple translation that replaces every p GU in 
a NautiLOD expression by a link pattern {+,p, _). For instance, the expression pi/p 2 
is translated into (+,pi, _)/{+,P 2 , _)*■ To translate (_) and [(askP)] we use {Iv, q). 
The complete translation poses several other complications (as described in the ap¬ 
pendix). In particular, the last step of NautiLOD expressions must be translated by 
using a SPARQL pattern and not an LPE. For this we use the following property. Given 
a regular expression r that does not generate the empty word, one can always write r as 
ri/oil ■ • • where the a/s are base symbols of the alphabet. Thus, we can trans¬ 

late r by using LPEs to translate the r/s as outlined above; next, translate the a/s by 
using a method similar to the proof of Theorem 2, and finally use union for |. 

Along the same lines of Theorem 1 one can prove the following result. 

Theorem 4. There exists an LDQL query q(lx) that cannot be expressed in NautiLOD. 

4.3 Comparison with SPARQL under Reachahility-Based Query Semantics 

In [11] the author introduces a family of reachability-based query semantics based on 
which SPARQL graph patterns can be used as a query language for Linked Data on 
the WWW. Similar to how the scope of the SPARQL part of a basic LDQL query is 
restricted to particular documents, reachability-based semantics restrict the scope of 

In [7], all URIs have an assigned set of RDF triples (which may be empty). In our data model 
one can have URIs not in dom(adoc). Hence, to properly capture the semantics of NautiLOD 
in terms of our data model we have to introduce conditions of the form “u' G dom(adoc).” 



SPARQL queries to documents that can be reached by traversing a well-defined set of 
data links. To specify what data links belong to such a set, the notion of a reachability 
criterion is used; that is, a function c: 7” xU xV ^ {true, false} where V denotes the 
set of all SPARQL graph patterns. Then, given such a reachability criterion c, a finite 
set S of URIs and a SPARQL graph pattern P, a document d G V is (c, S, P)-reachable 
in a Web of Linked Data W = {D, adoc) if any of the following two conditions holds: 

1. There exists a URI u G S such that adoc{u) = d; or 

2. there exists a link graph edge (dsrc, (f, u), dtgt) G Gw such that (i) dsrc is (c, S, P)- 
reachable in W, (ii) c(f, u, P) = true, and (iii) dtgt = d. 

Notice how the second condition restricts the notion of reachability by ignoring 
data links that do not satisfy the given reachability criterion c. Concrete examples of 
reachability criteria are caii, CNone, and CMatch [11], where caii selects all data links, and 
CNone ignores all data links; i.e., CAii(f, u, P) = true and C|\ione(f, P) = false for all 
tuples {t,u,P) G T X 14 X V. In contrast to such an all-or-nothing strategy, criterion 
CMatch returns true for every data link whose triple matches a triple pattern of the given 
graph pattern; formally, CMatch(f, u, P) = true if and only if there exists some solution 
mapping fi such that p[fp] = t for an arbitrary triple pattern tp that is contained in P. 

Given the notion of a reachability criterion, it is possible to define a family of (reach- 
ability-based) query semantics for SPARQL. To this end, let c be a reachability criterion, 
let S' be a finite set of URIs, and let P be a SPARQL graph pattern. Then, for any Web of 
Linked Data W = {D, adoc), the S-based evaluation of P over W under c-semantics, 
denoted by is the set of solution mappings |P]g where G is the RDF graph 

that consists of all triples from all documents that are (c, S, P)-reachable in W. 

While there exist an infinite number of possible reachability criteria, in this paper 
we focus on caii, CNone, and CMatch- The following two results show that LDQL is strictly 
more expressive than SPARQL graph patterns under any of these three query semantics. 

Theorems. Let c G {caii, CNone,CMatch}- Por every SPARQL graph pattern P there 
exists an LDQL query q such that for every Web W and S GU. 

Proof (Sketch). We only sketch the case of caii- semantics. In this case, one can prove 
that the LPE lpe‘^“' = simulates the reachability criterion cam, and, thus, 

= {{Ipe^'"', P)}^. One can also find LPEs to simulate C|\ione and CMatch- 

Theorem 6. Let c G {cam, CNone, CMatchj- There exists an LDQL query qforwhich there 
does not exist a SPARQL pattern P such that |P] = |(?] ^/or every W and S G 14. 

5 Web-Safeness of LDQL Queries 

In this section we study the “Web-safeness” of LDQL queries, where, informally, we 
call a query Web-safe if a complete execution of the query over the WWW is possible 
in practice (which is not the case for all LDQL queries as we shall see). To provide a 
more formal definition of this notion of Web-safeness we make the following obser¬ 
vations. While the mathematical structures introduced by our data model capture the 
notion of Linked Data on the WWW formally (and, thus, allow us to provide a formal 


semantics for LDQL queries), in practice, these structures are not available completely 
for the WWW. For instance, given that an infinite number of strings can be used as 
HTTP URls [6], we cannot assume complete information about which URls are in the 
domain of the partial function adoc (i.e., can be looked up to retrieve some document) 
and which are not; in fact, disclosing this information would require a process that sys¬ 
tematically tries to look up every possible HTTP URI and, thus, would never terminate. 
Therefore, it is also impossible to guarantee the discovery of every document in the 
set D (without looking up an infinite number of URls). Consequently, any query whose 
execution requires a complete enumeration of this set is not feasible in practice. Based 
on these observations, we define Web-safeness of LDQL queries as follows. 

Definition 6. An LDQL query q is Web-safe if there exists an algorithm that, for any 
finite Web of Linked Data W = {D, adoc) and any finite set S of URls, computes |g]^ 
by looking up only a finite number of URls without assuming an a priori availability of 
any information about the sets D and dom(adoc). 

Example 6. Recall our example queries q'^^, and (cf. Example 5). For query 
9 ex = (seed lx (e, {lx,Pi, Iz))), any URI u € U may be used to obtain a nonempty 
subset of the query result as long as a lookup of u retrieves a document whose data in¬ 
cludes RDF triples that match {u,pi, Iz). Therefore, without access to D or dom(adoc) 
of the queried Web W = {D,adoc), the completeness of the computed query result 
can be guaranteed only by checking each of the infinitely many possible HTTP URls. 
Hence, query q^x is not Web-safe. In contrast, although it contains q^x as a subquery, 
query q'^^ = {q^x ANDg'^) is Web-safe, and so is q'^^ = {lpe^,^,Bex)- Given ua as seed 
URI, a possible execution algorithm for may first compute by traversing 

the queried Web W based on Ipe^,^. Thereafter, the algorithm retrieves documents by 
looking up all URls u € (or simply keeps these documents after the traver¬ 

sal); and, finally, the algorithm evaluates pattern Bex over the union of the RDF data in 
the retrieved documents. If W is finite (i.e., contains a finite number of documents), the 
traversal process requires a finite number of URI lookups only, and so does the retrieval 
of documents in the second step; the final step does not look up any URI. To see that 
g'x is also Web-safe we note that after executing subquery (e.g., by using the algo¬ 
rithm as outlined before), the execution of the other (non-Web-safe) subquery q^x can 
be reduced to a finite number of URI lookups, namely the URls bound to variable lx 
in solution mappings obtained for subquery Although any other URI may also be 
used to obtain solution mappings for (jex, such solution mappings cannot be joined with 
any of the solution mappings for and, thus, are irrelevant for the result of 

The example illustrates that there exists an LDQL query that is not Web-safe. In 
fact, it is not difficult to see that the argument for the non-Web-safeness of query gex as 
made in the example can be applied to any LDQL query of the form (seed lx q) where 
subquery g is a (satisfiable) basic LDQL query; that is, none of these queries is Web- 
safe. However, the example also shows that more complex queries that contain such 
non-Web-safe subqueries may still be Web-safe. Therefore, we now show properties to 
identify LDQL queries that are Web-safe even if some of their subqueries are not. We 
begin with queries of the forms {Ipe, P), Tryg, (seed U g), and (gi union ... union g„). 


Proposition 2. An LDQL query q is Web-safe if any of the following properties holds: 

1. Query q is of the form {Ipe, P) and Ipe is Web-safe, where we call an LPE Web-safe 
if either (i) it is of the form {7v, q') and LDQL query q' is Web-safe, or (ii) it is of 
any form other than {7v, q') and all its subexpressions (if any) are Web-safe LPLs; 

2. Query q is of the form Tivq' or (SEED U q'), and subquery q' is Web-safe; or 

3. Query q is of the form {qi UNION ... UNION g„) and each qi (l<i<n) is Web-safe. 

It remains to discuss LDQL queries of the form (gi and ... and g^). Our discussion 
of query g"^ in Example 6 suggests that such queries can be shown to be Web-safe if all 
non-Web-safe subqueries are of the form (seed 7v q) and it is possible to execute these 
subqueries by using variable bindings obtained from other subqueries. A necessary con¬ 
dition for this execution strategy is that the variable in question (i.e., ?v) is guaranteed 
to be bound in every possible solution mapping obtained from the other subqueries. 

To allow for an automated verification of this condition we adopt Buil-Aranda et 
al.’s notion of strongly bound variables [4]. To this end, for any SPARQL graph pat¬ 
tern P, let sbvars(P) denote the set of strongly bound variables in P as defined by 
Buil-Aranda et al. [4]. For the sake of space, we do not repeat the definition here. How¬ 
ever, we emphasize that sbvars(P) can be constructed recursively, and each variable in 
sbvars(P) is guaranteed to be bound in every possible solution for P [4, Proposition 1]. 
To carry over these properties to LDQL queries, we use the notion of strongly bound 
variables in SPARQL patterns to define the following notion of strongly bound variables 
in LDQL queries; thereafter, in Lemma 3, we show the desired boundedness guarantee. 

Definition 7. The set of strongly bound variables in an LDQL query g, denoted by 
sbvars(g), is defined recursively as follows; 

1. If g is of the form {Ipe, P), then sbvars(g) = sbvars(P). 

2. If g is of the form (gi and g 2 ), then sbvars(g) = sbvars(gi) U sbvars(g 2 ). 

3. If g is of the form (gi union g 2 ), then sbvars(g) = sbvars(gi) fl sbvars(g 2 ). 

4. If g is of the form Tr^g', then sbvars(g) = sbvars(g') fl V. 

5. If g is of the form (seed U g'), then sbvars(g) = sbvars(g'). 

6 . If g is of the form (seed ?v g'), then sbvars(g) = sbvars(g') U {?u}. 

Lemma 3. Let g be an LDQL query. Lor every finite set S ofURIs, every Web of Linked 
Data W, and every solution mapping p, e Mi., it holds that sbvars(g) C dom(p). 

We are now ready to show the following result. 

Theorem 7. An LDQL query of the form {qi ANDg 2 and ... and q^n) is Web-safe if there 
exists a total order -< over the set of subqueries {gi,g 2 , ,gm} such that for each 

subquery g^ fl < i < m), it holds that either (i) qi is Web-safe or (ii) qi is of the 

form (seed Iv q) where q is Web-safe and 7v € sbvars(gj). 

Proof (Sketch). We prove Theorem 7 based on an iterative algorithm that generalizes 
the execution of query g'J^ as outlined in Example 6. That is, the algorithm executes the 
subqueries gi... g™ sequentially in the order -< such that each iteration executes one of 
the subqueries by using the solution mappings computed during the previous iteration. 


With the results in this section we have all ingredients to devise a procedure to 
show Web-safeness for a large number of queries (including queries that are arbitrar¬ 
ily nested). However, as a potential limitation of such a procedure we note that The¬ 
orem 7 can be applied only in cases in which all non-Web-safe subqueries are of the 
form (seed Iv q). For instance, the theorem cannot be applied to show that an LDQL 
query of the form {qi and (92 union (seed lx 93 ))) is Web-safe if lx S sbvars(qi) and 
qi, q 2 and ^3 are Web-safe. On the other hand, for the semantically equivalent query 
((gi AND 52 ) UNION (gi AND (SEED Ix qs))) we can show Web-safeness based on Theo¬ 
rem 7 (and Proposition 2). Fortunately, we may leverage the following fact to improve 
the effectiveness of applying Theorem 7 in the procedure that we aim to devise. 

Fact 1. If an LDQL query q is Web-safe, then so is any LDQL query q' with g' = g. 

As a consequence of Fact 1, we may use the equivalences in Lemma 2 to rewrite 
a given query into an equivalent query that is more suitable for testing Web-safeness 
based on our results. To this end, we introduce specific normal forms for LDQL queries: 

Definition 8. An LDQL query is in UNION-free normal form if it is of the form 
(gi AND ... ANDgm) with m > 1 and each gi (1 < * < to) is either (i) a basic LDQL 
query or (ii) of the form TTyg, (seed U q) or (seed Iv q) such that subquery g is in 
UNION-free normal form. An LDQL query is in UNION normal form if it is of the form 
(gi UNION ... UNION g„) with n> 1 and each gi (1 <n) is in UNION-free normal form. 

The following result is an immediate consequence of Lemma 2. 

Corollary 1. Every LDQL query is equivalent to an LDQL query in union normal form. 

In conjunction with Fact I, Corollary 1 allows us to focus on LDQL queries in union 
normal form without losing generality. We are now ready to specify our procedure that 
applies the results in this paper to test a given LDQL query g for Web-safeness: First, 
by using the equivalences in Lemma 2, the query has to be rewritten into a semanti¬ 
cally equivalent LDQL query gnf = (gi union ... union g„) that is in union normal form. 
Next, the following test has to be repeated for every subquery g^ (1 < i < n); recall that 
each of these subqueries is in UNiON-free normal form; i.e., g^ = (gj and ... ANDg^.). 
The test is to find an order for their subqueries gj, ..., g(„. that satisfies the conditions 
in Theorem 7. Every top-level subquery qi (1 < i < n) for which such an order exists, 
is Web-safe (cf. Theorem 7). If all top-level subqueries are identified to be Web-safe by 
this test, then gnf is Web-safe (cf. Proposition 2), and so is g (cf. Fact 1). 

The given conditions are sufficient to show Web-safeness of LDQL. It remains open 
whether there exists a (decidable) sufficient and necessary condition for Web-safeness. 

6 Concluding Remarks and Future Work 

LDQL, the query language that we introduce in this paper, allows users to express 
queries over Linked Data on the WWW. We defined LDQL such that navigational fea¬ 
tures for selecting the query-relevant documents on the Web are separate from patterns 
that are meant to be evaluated over the data in the selected documents. This separation 
distinguishes LDQL from other approaches to express queries over Linked Data. 


We focused on expressiveness, by comparing LDQL with previous formalisms, and 
on the notion of Web-safeness. Several topics remain open for future work. One of 
them is the complexity of query evaluation. A classical complexity analysis is easy to 
perform if we assume that all the data and documents are available as if they were in a 
centralized repository, and that they can be processed via a RAM machine model. We 
conjecture that under this model, the data complexity of evaluating LDQL will be poly¬ 
nomial. Nevertheless, a more interesting complexity analysis should consider a model 
that captures the inherent way of accessing the Web of Linked Data via HTTP requests, 
the overhead of data communication and transfer, the distribution of data and docu¬ 
ments, etc. A more practical direction for future research on LDQL is the development 
of approaches to actually implement LDQL queries efficiently. 

Acknowledgements Perez is supported by the Millennium Nucleus Center for Seman¬ 
tic Web Research, Grant NC120004, and Fondecyt grant 1140790. 

References 

1. Arenas, M., Conca, S., Perez, J.: Counting beyond a yottabyte, or how SPARQL 1.1 property 
paths will prevent adoption of the standard. In: WWW 2012. pp. 629-638 (2012) 

2. Arenas, M., Gutierrez, C., Perez, J.: On the Semantics of SPARQL. In: Semantic Web Infor¬ 
mation Management - A Model-Based Perspective, chap. 13, pp. 281-307. Springer (2009) 

3. Bemers-Lee, T.: Linked Data. At http://www.w3.org/DesignIssues/LinkedData.html (2006) 

4. Buil-Aranda, C., Arenas, M., Corcho, O.: Semantics and Optimization of the SPARQL 1.1 
Federation Extension. In: Proc. 8th Extended Semantic Web Conf. (2011) 

5. Cyganiak, R., Wood, D., Lanthaler, M.: RDF 1.1 Concepts and Abstract Syntax. W3C Rec¬ 
ommendation (Feb 2014) 

6. Fielding, R., Gettys, J., Mogul, J.C., Frystyk, H., Masinter, L., Leach, P.J., Berners-Lee, T.: 
Hypertext Transfer Protocol - HTTP/1.1 (Jun 1999) 

7. Fionda, V., Pirro, G., Gutierrez, C.: NautiLOD: A Formal Language for the Web of Data 
Graph. ACM Transactions on the Web 9(1), 5:1-5:43 (2015) 

8. Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 Query Language. W3C Recom¬ 
mendation (Mar 2013) 

9. Harth, A., Speiser, S.: On Completeness Classes for Query Evaluation on Linked Data. In: 
Proc. 26th AAAI Conf. (2012) 

10. Hartig, O.: LDQL: A Language for Linked Data Queries. In AMW 2015 

11. Hartig, O.: SPARQL for a Web of Linked Data: Semantics and Computability. In: Proc. 9th 
Extended Semantic Web Conf. (2012) 

12. Hartig, O.: An Overview on Execution Strategies for Linked Data Queries. Datenbank- 
Spektrum 13(2) (2013) 

13. Hartig, O., Perez, J.: LDQL: A Query Language for the Web of Linked Data. In: Proc. 14th 
Int. Semantic Web Conf. (2015) 

14. Hartig, O., Pirro, G.: A Context-Based Semantics for SPARQL Property Paths over the Web. 
In: Proc. 12th Extended Semantic Web Conf. (2015) 

15. Mendelzon, A. O., Mihaila, G. A., Milo T.: Querying the World Wide Web. In: PDIS (1996) 

16. Perez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM Transac¬ 
tions on Database Systems 34 (2009) 

17. Perez, J., Arenas, M., Gutierrez, C.: nSPARQL: A Navigational Language for RDF. J. Web 
Sem. 8(4), 255-270 (2010) 

18. Umbrich, J., Hogan, A., Polleres, A., Decker, S.: Link Traversal Querying for a Diverse Web 
of Data. Semantic Web Journal (2014) 


A Proofs 


A.l Proof of Lemma 1 

We formalize the claims in Lemma 1 as follows; Let qi, q2, and 53 be LDQL queries, 


the following semantic equivalences hold; 

(gi ANDqz) = (92 AND(?i) ( 5 ) 

(gi UNION 92) = (92 UNION (Ji) (6) 

(gi AND {q2 AND 53)) = ((gi AND g2) AND qs) ( 7 ) 

(gi UNION (g2 UNION gs)) = ((gi UNION q2) UNION gs) (8) 


Since the definition of LDQL operators and and union is equivalent to their SPARQL 
counterparts, these semantic equivalences follow from corresponding equivalences for 
SPARQL graph patterns as shown by Perez et al. [ 16 , Lemma 2 . 5 ]. 

A.2 Proof of Lemma 2 

The equivalences follow directly from the definition of every operator. 

A.3 Proof of Proposition 1 

The proof is based on a recursive translation of link path expressions beginning with link 
patterns. Let (t/i, t/2, J/a) be a link pattern. We construct an LPE transL((yi, y2,2/3)) as 
follows. Assume that yi = _, then we construct the LDQL query 

gi = (e, (graph 7u {Tout, Y2, La))) 

where (i) if 2/2 = + then I2 = 7 u, (ii) if 2/2 G then Y2 = 2/2 and (iii) if 2/2 = _ 
then Y2 =?2/2- And similarly, if 2/3 = + then Y3 =?u, (ii) if j/3 €14 then Y3 = j/3 
and (iii) if 7/3 = _ then Y^ =?2/3- If ?2/2 = _ then que can construct query g2 = 
(e, (graph ?u (Yi, ?oMf, Ya))), andif?7/3 = _ query ga = (e, (graph ?m (Yi, >2, ?0Mf))), 
following a similar process as for gi. Now consider the query g which is the union of 
the above queries for every yi = _. Then LPE transi(( 2 /i, 2/2,2/3)) is constructed as 

transL((2/i,2/2,2/3)) = { 7 out,q). 

It is not difficult to prove that [transi((2/1,2/2,2/3))!]^ = 1 ( 2 / 1 .2/2, 2 / 3 )]]V- 
We now define the translation in general; 

- Eor the case of LPE r = ri/r2, we have that transL(r) = (?u, g) where g is; 

( (transL(ri), (graph ?x { })) and (seedTx (transL(r 2 ), (graph ??; { }))) ). 

- Eor the case of LPE r = ri|'r2, we have that transz,(r) = { 7 v, q) where g is; 

( (transL(ri), (graph Yu { })) union (transL(r-2), (graph ?u {}))). 


- For the case of LPE r = [ri], we have that transL(r) = {Iv, q) where q is: 


( (e, (graph ?r; { })) and 7r{7„} (seed ?i; {transL(ri), (graph ?x { }))) ). 

The general proof proceed by induction. We next prove that |transL(ri|r 2 )]i^ = 
The proof for the other cases are similar. Thus assume that u' € 
then we know that u' € U [rsKV- If u' e |ri](V then by induction hypothesis 

we know that u' € |transL(ri)J(y. Now notice that 

|(tranSL(ri), (GRAPH Iv { = [(GRAPH Iv { })]® 

Where V = datasetw([transL(ri)J(^). Thus given that u' € [transL(r'i)J5^ we 
know that T> has a dataset (u', data(adoc(u'))), which implies that {Iv —)• u'{ is a so¬ 
lution for [(graph Iv { })1®, and thus {?u ^ u'} G |(transL(ri), (graph Iv { }))]^^. 
From this it is straightforward to conclude that u' € [transi,(ri |r 2 )J(^. The other di¬ 
rection is similar. 

A.4 Proof of Theorem 1 

Consider the LDQL Q query given by 

(seed u {{+,p, _), (?a:, ?x, ?a;))) 

with u,p G U. Now assume that there exists a property path pattern P and a set of URIs 
S such that 

= [Qlw 

for every Web of Linked Data W. Let u' G U. Consider now Wi having only two 
documents di = {(u,p,u')} and ^2 = {(a, a, a)} and such that adoc(u) = di and 
adoc{u') = d 2 . Moreover, consider W 2 having also two documents di = {{u,p, it')} 
and da = {(&, b, 6)} such that adoc{u) = di and adoc(u') = d^- Lirst notice that for 
every S we have that 

[Qlwi = 7 ^ [Q]^2 = 

Notice that C^^(u) = C^^{u) = {{u,p,u')} and C^^{u') = C'^^(m') = 0 . In 
general, we have that for every term u it it holds that (v) = (v) = 0 . This 

essentially shows that the context selectors and are equivalent. Given that 
the semantics of property paths is based on context selectors it is easy to prove that for 
every PP-based SPARQL query R we have that This can be done by 

induction in the construction of PP-based SPARQL queries. Lor example, the evaluation 
of a base PP-pattern of the form {v,p, /3), with v and /3 G U V over Wi is given 
by 

= {d I dom(/i) = {/3}n Vand/i[(i;,p,/3)] G C^^{v)} 

which is equal to [(I'jP,/3)]^* since C'^'^{v) = C^^{v). All the other cases for the 
construction of property paths are equivalent. Moreover, since for the case of property 
path patterns the evaluation is the same over ITi and over IT 2 , we have that for a general 


PP-based query using operator and , union , opt and so on, the evaluation is also the 
same. Thus we have that 

Mwl = Mwl 

but also that 

[Ql^i ¥= [Q1^2 

which contradicts the fact that = |Q]^ for every Web of Linked Data W. 

A.5 Proof of Theorem 2 

We associate to every property-path expression r, an LDQL query Qr{7x, ly) with lx 
and ly as free variables. The dehnition of Qr{lx, ly) is by induction in the construction 
of property-path expressions. In the construction, all the variables mentioned, besides 
lx and ly, are considered as fresh variables. 

- lfr€U then Qr{lx, ly) = (seed lx (e, {lx, r, ly))). 

- If r = !(ui I • • • I Uk) with Ui €14 then Qr{lx, ly) is dehned as 

^SEED?x (e, {{lx, Ip, ly) FILTER (?p ^ Ui A ■ ■ ■ Alp ^ Uk)))'^. 

- If r = ri/r 2 then Qr{lx, ly) is dehned as 

7r{?x.?y} {Qri {4x, 1 z) AND {Iz, ly)). 

- Ifr = ri|r 2 then Qr{lx, ly) is dehned as 

{Qri {Tx, ly) UNION {lx,ly)). 

- If r = then Qr{lx, ly) is dehned as follows. First consider the LDQL query 

Qe{lx,ly) = 7r{72:,?y}(SEED?/ (e,P)) 

where P is the following pattern 

P ={{lx, Ip, lo) AND {ly. Ip, lo) FILTER {Ix =ly)) UNION 
((?S, lx, lo) AND {Is, ly, lo) FILTER {Ix =ly)) UNION 
((?S, Ip, lx) AND {Is, Ip, ly) FILTER {Ix =ly)) 

Now consider the LDQL query Qs{lv) dehned as 

Qs{lv) = {{ e , (graph ?U { })) ANDQri(?M, ?u)). 

Then, query Qr{lx, ly) is dehned by 

Qe{lx,ly) UNION ((SEED?X {{lv,Qs{lv))*, (GRAPH Iz { }))) MiD Qr^{l Z,ly)) 


We prove now that for every property path pattern {lx, r, ly) we have that 
l{lx,r, ly)jw^ = [Qr{lx,ly)jly. 

The proof is by induction in the construction of Qr(lx, ly). We proceed by cases. 

- Assume that r & U. Then y G {Qrilx, ly)l%- if and only if 

y € |(SEED?a; {e, (?a;, r, ?y)))]^. 

Notice that this occurs if and only if there exists a mapping y' and a URI u such 
that y' G |(£, (?x, r, ?y))J^^, y' is compatible with the mapping {lx —?► u}, 
and y = y' U {lx — u}. Now, given that = {u}, we have that y' G 

|(e, {lx,r,ly))]^{^ if and only if y' G |(?a;, r, ??/)]^ with V the data set with 
data(adoc(u)) as default graph. With all this we have that y G lQr{lx, ly)}^ if 
and only if dom(/i) = {lx,ly}, y{lx) G dom(adoc), and {y{lx),r, y{ly)) G 
data(adoc(^(?x))), which is exactly the property 

y{{lx,r,ly)) G C^{y{lx)). 

This last property holds if and only if p G \{lx, r, 

- For the case in which r = !(ui | • • • | Uk) with Ui G U, the proof is similar. We 
have that 

y G l|^SEED lx (e, ((?x, Ip, ly) filter {Ip^uiA--- Alp ^ Uk)))^jw 

if and only if y is in the evaluation of 

{{lx, Ip, ly) FILTER {Ip ^ Ui A ■ ■ ■ Alp 7^ Uk)) 
over the graph data(/i(?a;)). This happens if and only if 

y{{lx,p, ly)) G {y{lx))foTp ^ {ui, ..., Uk}, 
which is exactly the property 


y G |(?a;,!(Mi | ••• | 

- For the cases r = ri(r 2 , r = ri\r 2 , the semantics of the corresponding LDQL 
query exactly matches the semantics of the property path expression. Just notice 
that the semantics of and is that of the join, and the semantics of union is that of 
the set union. 

- For the case of r = r^ we have that y G |(?a;, if and only if dom(/i) = 

{lx,ly} and (i) y{lx) = y{ly) and y{lx), y{ly) G terms{W), or (ii) y G 

for some fc > 0. For the case (i) it is easy to see that y G 
{Q,{lx,ly)\l. Just notice that if y{lx) is in terms{W) then there exists a URI u G 
domjodoc) and a triple t in data(adoc(u)) such that y{lx) appears in t. If y{lx) 
appears in the subject position, then we know that y is compatible with a mapping in 


|((?a;, ?p, ?o) AND (?y, ?p, ?o) filter {7x =??/))]^ where I? is a dataset with a de¬ 
fault graph in which t appears. Finally, given that (5e(?x, ly) = TT^Vj, (seed ?/ (e,P)) 
and we know that u is a possible value for variable ?/, we obtain that fi € |Q£(?a:, ?2/)]^. 
If /r(?a;) appears in the predicate or object position, the proof is similar. For the case 
(ii) we will show that 


p e lQsi?x,7y)]w'J 

k-1 

U ((SEED ?X ((?V, Qs(?v)y, (GRAPH ?« { }))) AND (?Z, (9) 

i^O 

We will use an inductive argument. Assume k = 1, then ^ € |(?x, ri, By 

the induction hypothesis on the construction of property paths, we have that p G 
lQrA^x,ly)fyy. Now, if fi{7x) ^ dom(adoc) then we have that fJ.(?x) = fJ.(Jy) 
and thus p € [Qe(?a;, ?y)I^. If fJ-iJx) G dom(adoc), then it is easy to see that 

F e ((seed lx {e, (graph Tz { }))) AND Qr, {Iz, ?y))I^ 

This is because {lx y,(lx),lz — F y{lx)} G |(seed lx {e, (graph Iz { })))1^ 
which is compatible with p. Now assume that p G |(?x, thus 

F € 7r{7a;_7y}.(|(?x,r('', ?2;)J^* N |(?z, ri, ?2/)|^*). 

By induction hypothesis we have that 


k-1 


lQe{lx,?z)rwU 

U l7r{?.,7,} ((SEED lx {{Iv, Qsilv)y, (GRAPH lu { })» AND Qr, (?M, lz))fw 

i=0 

^ lQrA'^z,ly)jl 

Then we know that there exists j such that 0<j<k — 1 such that 


y & 7^{?x,?y}\^ lQe{lx,lz)\w^ 

H?.,?,} ((SEED lx {{Iv, Qs{lv)y, {GRAPH lu { }))) AND Qr, (lu, lz))fw 


If/i G 'r^{7x7y}{lQe{^-x,lz)\%^ N [Qri(?z, ?2/)]^) = |Qr-i(?a:, ? 2 /)l^ we Can 
apply the same argument as in the base case. Now assume that 








H?.,?.} ((SEED ?X ((?«, g4?w))", (graph ?M { }))) AND Qr, (?M, ?z))lt 

N ig.i(?2,?2/)]t). 


Then we know that there exists a mapping /i' and /r" such that 

e |7r|7^,74((SEED?a; ((?t;, Q^(?t;))^ (graph ?u { }))) and (?m, ?z))]^ 

and 

M"e [Q.,(?z,?t/)C 

p equals ji' U p" restricted to variables ?x, ly. Notice that p," is compatible with 
p' thus, we have that y'ijlz) = yL"{lz). Now, if y''(J!z) ^ dom(adoc), since 
p" G lQrA^z,7y)]l, then necessarily y"{7z) = p"(?y), and given that p' is 
compatible with p" we obtain that iJ,'{7z) = y,"{?y). All this implies that 


P G ^ 

[7r{?,,?,} ((SEED 7x {{?v, Qs{7v)y, (GRAPH ?u { }))) AND Qr^ (?M, 7y))fw^ . 

and thus (9) holds. Assume now that y."{7z) G dom(adoc). We will prove that 
p' G |(SEED?a; ((?-(;, Q«(?t;)y+\ (GRAPH ?2 { 

We know that 

e |7r|7^p4((SEED?a; ((?t;, Qs(?'y))^ (graph ?it { }))) and (?u, ?z))]^ 
Thus p' equals pi U p 2 (restricted to variables 7x, 7z) where 

Pi G |(SEED?x ((?v,g«(?t;))^ (graph ?u { })))C 

and 

P2 e lQrr{7u,7z)jl^ 

Thus, regarding pi we know that there exists a sequence of URIs, ui,U 2 , ■■■Uj 
such that yi{7x) = ui, pi(?u) = uj and m+i G \{7v,Qs{7v))\'^. Now, recall 
that the definition of Qs{7v) is 

Qs{7v) = ({e, (GRAPH?/ { })) AND (?/,?!;)). 

Then essentially what we have is that 


{?/ ^ u,, 7v ^ u.+i} G ig., (?/, 7v)jl 


Moreover, since and ^2 are compatible, we know that ^i{lu) = /r 2 (?'u) = Uj 
and since ^2 € [Qn (?u, we know that 


{?/ ^ Uj,?v ^ ^l2i^z)} € IQrA^f, ^v)fw. 

Finally, given that we are assuming that = ^ 2 (^ 2 ) is in dom(adoc) we have 

that 

{?x fj.i{7x),7z fJ. 2 {^z)} e |(SEED?a; ((?u, (3s(?t;))'’+\ (graph Yz { })))]^ 

which is what we wanted to prove. Thus we have that 

Ai' e |(SEED?x ((?z;,Qs(?u))^ + \ (graph ?2 { })))1^. 


and also that 

At" G [g.,(?z,?2/)C 

and given that fj, equals U a*" restricted to variables ?a:, 7y, we have that 

F e [7r{,,,7yj((SEED?a: {(?W, Q4?u))^+\ (GRAPH ?z { }))) AND (?z, ?y))]t 

and since j + 1 < fe we obtain 

F G \Qei7x,ly)\%ryj 
k 

U ((seed ?* ((?«, Qs(7v)y, (GRAPH 7z { }))) AND Qr^ (?z, 7y))fw 

i=0 

If one assumes that /i G lQr{7x, ?2/)l^* then by an argument on exactly the same 
lines of the argument above, one can show that y G |(?a;, r*, ? 2 /)|^*. 

We have shown how to construct an equivalent LDQL query for every property path 
pattern with two variables. If the triple does not have two variables, we need a slightly 
different construction, in particular for the case in which (•)* is used. We now show 
the details of the construction but leave the complete proof as an exercise (it can be 
completed using the arguments of the previous part of this proof). 

Consider a propery path pattern (a, r, 0) where a is a URI or variable, and /3 is a 
URI, variable or literal. Then for the cases r = p G U, r =!(iti| • • • \uk), r = ri/r 2 , 
r = ri\r 2 , we construct a query as Qr{a,l3) where Qr{a,l3) is query Qr{7x,7y) 
where all occurrences of 7x has been replaced by a and all occurrences of ly has been 
replaced by 0 For the case of r = r* we need to do a slightly different construction. 
For a pattern (u, r, ly) we construct a query Pr{ly) as 

(e, BIND(MAS?y)) UNION ((SEED{u} {{Iv ,Q s{1v))* , (graph Yz { }))) ANDgri(?^:,?y)) 
For a pattern {lx, r, v) we construct a query S)! (lx) as 

(er, bind(mas?x)) union ((sEED?a; ((lv,Qs(lv))*, (graph ?z { }))) ANDr(?z)) 


where T{lz) is either Qn (?-z, v) or 5^^ (?z) depending on the form of ri. Finally, for a 
pattern (m, r, v) we construct a query as 

(e, (BIND(lt AS ?x) AND BIND(u AS ?y)) FILTER (?X =?l/)) UNION 

((SEED{u} {{7v,Qsi^v))*, (graph ?Z { }))) ANDr(? 2 :)) 

where T{lz) is either Qn (? 2 , v) or (Iz) depending on the form of ri. 

Finally consider a property path pattern (f, r, /?), where £ is a literal. Then for the 
cases r = p G U, r =!(mi| ■ • • \uk) we should translate it into an unsatisfiable query. 
One way of obtaining that query is, for example, with an expression 

(e, (bind(^ as ?x) and bind(£ as 7y)) filter (?x ^7y)) 

For the cases r = rx/r^ and r = ri |r 2 we follow the same construction as if £ were a 
URI but with the last base case. For the case of r = rj, if is a variable y we consider 
the following query 

(e, bind(£ AS?y)). 

and if ^ is a URI or literal the query 

(e, (bind(£ AS?a;) and bind(/ 3 AS?y)) filter (?a; =7y)). 

The correctness of this translation can be proved along the same lines as for the case of 
property path pattern {lx,r, 7y). 

A.6 Proof of Theorem 3 

We proceed by induction showing how to translate every posible NautiLOD query. The 
translation works in two parts. We first define the following function transAr(-) that 
given a NautiLOD query, produces an LPE. 

transAr(p) = (+,P, _) 
transAr(p'') = (_,P,+) 

transAr((_)) = {7x, (e, (graph 7u {7u, 7p, 7x))')') 
transAT (711/712) = transAr( 7 ii)/ transAr( 7 i 2 ) 
transiv(77iI 772 ) = transAr(ni)| transAr(772) 
transAr(n*) = transAr(n)* 

transAr(n[(ASKP)]) = transAr(n)/[(?a;, (e, (graph ?x P))] 


Before presenting the complete translations, we prove the following result. Let n be a 
NautiLOD expression, then for every Web of Linked Data and URIs u,v G dom(a(ioc) 
we have that 

V G [ 77 ] if and only if v G |tranSAr(77)]|y^. 

The proof is by induction in the construction of the NautiLOD expression. 


- for the case of p G if we have that 


\p\w = W I G data(adoc(u))} 

notice that v G dom(adoc) and v G |p]vi/, if and only if there is a link from 
document adoc{u) to document adoc{v) that matches {+,p, _). This happens, if 
and only if u G [(+, p, _ which is what we wanted to prove. 

- the case for p" is similar but using (+, p, _). 

- the case for (_). Just notice that if and only if there exists a p G 

U such that {u,p,v) G data{adoc{u)). On the other hand we have that v G 
|transAr((_))]j^^ = |(?x, (e, (graph ?m (?m, ?p, ?a;))))][((^ if and only if u G 
|7r7a;(GRAPH ?u (?M, ?p, ?a;))]® where V = {data(adoc(M)), (u, data(a(ioc(u)))}. 
Thus V G |transAr((_))J^^ if and only if there exists p such that {u,p,v) G 
data(adoc(u)). This proves the desired property. 

- for the case of an expression nxjni, we have that v in dom{adoc) is in 

if and only if, there exists v' G dom(ac?oc) such that v' G |nil5^ and v G I'n-2]vv 
The we can apply or induction hypothesis and we have that v G |ni/n 2 ]((^ if 

and only if v' G |transAr(ni)j|^^ and v G |transAr(n 2 )]|^ and thus v G 
|transAr(ni/n2)l|(^^ 

- cases ni\n2 and n* are direct from the definition of NautiLOD and LDQL. 

- for the case of expression n[(ASK P)] we have that v G |n[(ASK if ^nd only if 

V G V G dom(adoc) and |P]data(adoc(«)) ^ 0- On the other hand, we have 

that V G |transAr(n[(ASKP)])j{y^ if and only if 

V G |trans 7 v(n)/[(?a;, {e, (graph ?x P))]1{^^ 

This happens if and only if there exists a v' such that v' G |transAr(n)j[^^ and 

V G [[(?x,(e, (GRAPH ?x P))]tw^. From the last property and the semantics of 

[•] in LDQL, we have that v = v' and that |(?x, (er, (graph ?a; ^ 0. 

The last holds if and only if Ittv^;(graph ?x P)]^ ^ 0, with V the RDF dataset 
{data(adoc(u)), (u, data(adoc(z;)))}. Thus we have that u G |transAr(n[(ASKP)])j{^^ 
if and only if u G |transAr(n)l|^^ and |P]data(adoc(u)) ^ 0- Applying our induc¬ 
tion hypothesis we have r’ G and [P]data(adoc(u)) 0^ which is exactly what 
we needed to prove. 

Notice that the hypothesis that v G dom{adoc) was fundamental to prove the previous 
result. Nevertheless, the output of a NautiLOD query can be a URI not in dom(adoc) 
or even a literal, so we need to do a different translation in general. Thus, we use now 
transAr( ) to translate a general NautiLOD query. Given a NautiLOD expression n we 
have two cases. Assume first that n, as a regular expression, does not produce the empty 
string £. Then, by using reglar language results, we know that we can write an equivalent 
expression n' of the form 

ni/ei I • • ■ I Uk/ck I mi[(ASKPi)] | | me[{ASKPe)] 


where every rii and rrij is a NautiLOD query, and every is either of the form p, or p", 
or (_). We are ready now to produce an LDQL query Qn{lx) which is equivalent to n. 
The query is constructed as follows. 


Qn{Tx) = |^(transAr(ni), Qi) union • • • union (transAr(nfe), Qk) union 
( trans 7 v(TOi), (graph ?a; Pi)) union • • • union (transiv(TOr), (graph ?a: Pi)) 

where query Qi depends on the form of ep. 

- ifei=p then Qi = (graph lu Qu,p, ?x)) 

- if Bi = p" then Qi = (graph ?u Qx,p, ?u)) 

- if Bi = {_) then Qi = (graph lu {lu, Ip, lx)) 

Now to prove the correctness of our construction, assume that v € Then we 

knowthatu £ {ui/BiY^ oxv £ |mj[(ASKPi)]]()^ forsome i. If u G we know 

that there exists a v' such that v' £ and v £ lei]));/. Notice that, since v £ lei]));/, 

and Bi is either p, or p", or (_) then we know that v' is in dom{adoc). Thus we can 
apply our previous result to conclude from v' £ that v' £ |transAr(ni)]|^^. 

Now if Bi = p then from y e [Bij^ we conclude that {v',p,v) £ data(adoB(v')) and 
thus [(lu,p,lx)ldata(adoc(v')) Contains the mapping p = {lu ^ v',lx u}, then 
[(graph ?M (?M,p, ?a;))]^ has/r as solution, with X> = {data(adoc(u')), (u', data(a(ioc(u')))}. 
Given that v' £ |transAr(ni)j|((^, we have that 

p = {??H. v', Ix^ v} £ |(transAr(ni), Qi)lw^- 

Finally, given that Qn(lx) only keep the lx variable, we have that [lx w} is in 
[(5„(?x)] which is what we wanted to show. If Ci = p" ox Bi = {_) the proof is the 
essentially the same. 

Now assume that v £ This implies that v is in and that 

[T’j]data(adoc(i;)) i 0- By the Semantics of NautiLOD, we have that v is in dom(adoc) 
(otherwise we could not have been able to evaluate P), and thus we can apply our re¬ 
sult above to obtain that v £ |transjv(mi)l^^ Now, given that [Pildata(adoc(-a)) i 0 
we have that [(graph lx P^)!^ i 0 where V = {data(adoc(u)), {v, data(adoc(u)))}. 
Moreover, we have that every mapping p in [(graph lx Pi)]^ is such that p{lx) = v. 

All these facts implies that mapping/i' = {lx —>■ ujisin [(transAr(m^), (graph lxPi))l\i\ 
and thus p' is in lQn{'^x)}\i^ which is exactly what we wanted to prove. 

If we start by assuming that p = {lx —>■ u} is in lQn{'^x)}\i\ then following a 
similar reasoning as above one concludes that v £ 

To complete the proof we have to cover the case in which n, as a regular expres¬ 
sion, can produce the empty string. Then, by applying some classical regular languages 
properties, one can rewrite n as e|n' with n' an expression that does not produce the 
empty string e. Thus we can translate n into the LDQL query 


{e, (graph lx { })) union Qn'i^x) 


Notice that for every u G (laia{adoc{v)) we have that Ke, (graph '^x { }))] results 
in a single mapping ^ = {lx ^ u}. 


A.7 Proof of Theorem 4 

Recall that NautiLOD can only express paths and no combination of those paths via 
SPARQL operators is allowed. Thus, it is easy to prove that NautiLOD cannot express 
operators such as seed , and , union that are natively allowed in LDQL. Thus to make 
a stronger claim, we will prove that there exists simple LDQL query not using the 
mentioned operators, that cannot be expressed using NautiLOD. The proof is similar to 
the proof of Theorem 1 . 

Thus, consider the LDQL Q{lx) query given by 

{{+.P, -)A^x,lx,lx)) 

with p gU. Now assume that there exists a NautiLOD expression n such that 

InVw = 

for every Web of Linked Data W and v G dom(adoc). Let u,u',a,b be different 
elements in 14 that are not mentioned in n. Consider now Wi having only two doc¬ 
uments di = {{u,p,u')} and d2 = {(a, a, a)} and such that adoc{u) = di and 
adoc{u') = ^2. Moreover, consider W2 having also two documents d\ = {(u,p, u')} 
and ds = {(6, b, 6)} such that adociu) = d\ and adoc{u') = d^. First notice that 

Mlx)jP^={{lx^a}} ^ [g(?x)lW={{?x^6}} 

We now prove that which is a contradiction. To prove this, we show 

that for every subexpression e of n, and for every possible URI v, it holds that — 

[slwG- notice that Wi and W2 has only two URIs in dom{adoc), namely, u and 
u', thus, we only have to reason for the cases in which v = uor v = u'. We proceed by 
induction. 

- Assume that e = r gU. Given that in Wi and W2 the URI u is associated with the 
same document (document di), then WvPi — Ww-a- Moreover, given that r ^ a 
and r ^ b (recall that n does not mention a or b), we have that Wvki — Wi^2 “ 

- Assume that e = r" with r G U. Exactly the same argument as the above case 
applies. 

- Assume that e = (_). For the same reason as in the above two cases we have that 

¥\w^ = HiVs-Now consider Then we have that URI u is in |(_)] 5 ^^ if 

and only if, there exists some p such that {u',p, v) G data{adoc{u')), but the only 
triple in data(adoc(u')) is (a, a, a) and since a ^ it' we have that = 0. 

For a similar reason we obtain that |( _ )1 (4^2 = 0 , completing this part of the proof. 

- The cases e = r'i/r2, e = ri|r2 and e = r* follows from the base cases proved 
above. 


- Assume e = r[(ASK P)]. By definition we have that 

|r[(ASKP)]l^ = {v' I v' e Irj^, v' G dom(a(ioc) and |Pldata(adocK)) 

By induction hypothesis we have that WtVi — W 1 V 2 for V = u, m'. Thus we only 
need to prove that the evaluation of P is always the same, given that da,ta{adoc{u)) 
is the same document in Wi and W2, we have that for u the property holds. Now 
consider |T’]d2 [Plds with di = {{a, a, a)} and d2 = {{b,b,b)}. Recall that 
P does not mention a or b, thus we have that if /i G |T’](i2 then the mapping p' 
obtained from p by replacing every occurrence of a by b, is in iPlda, and vice 
versa. Thus we have that |P]d 2 = 0 if and only if |P](i 3 = 0 . This proves that 
|r[(ASKP)]]^^ = [r[(ASKP)]l^^ foru = u,u'. 

We have finished the proof that contradicting the fact that n is 

equivalent to Q{lx). 

A .8 Proof of Theorem 5 

Let P be an arbitrary SPARQL graph pattern, let W = {D, adoc) be an arbitrary Web 
of Linked Data, and let S be some finite set of URIs. To prove the theorem we use 
the (basic) LDQL queries {Ipe'^'"' , P), , P), and , P), with the follow¬ 

ing LPEs; 

is 

^pgCNone jg 

is {{7s,qi)\{?p,qi)\{7o,qi)\ ... \{?s,qm)\{'^P,qm)\{^o,qm))* where 
?s, ?p and ?o are fresh variables (not used in P), m is the number of 
triple patterns in P, and for each such triple pattern tpk {1 < k < m) 
there exists a subquery qk of the form {£,Pk) with a SPARQL pat¬ 
tern Pk that is constructed as follows: Pk contains the triple pattern 
(?s, 7p, ?o) and—depending on the form of the corresponding triple pat¬ 
tern tpk = { sk , Pk , Ok )— may contain additional filter operators; in par¬ 
ticular, if Sk ^ V, then Pk contains filter ?s = Sk ', if Pk ^ V, then Pk 
contains filter ?p = pk', and if Ok ^ V, then Pk contains filter ?o = Ofc. 

Then, for each reachability criterion c G {cam, C|\ione, CMatch} with its corresponding 
LPE Ipe'^ as specified above, we have to show the following equivalence: 

mT''’=l{lpe'^,P)]w- ( 10 ) 

By the definition of the reachability-based query semantics (cf. Section 4.3) and 
the definition of LDQL query semantics (cf. Definition 5), it is sufficient to prove the 
following lemma to show that (10) holds for each c G {caii, CNone, CMatch}. 

Lemma 4 . For each c G {cam , CNonej CMatch}. the set of all documents that are (c, S', P)- 
reachable in W is equivalent to the following set of documents: 


Pipe = {adoc{u) \ u G \lpe^\'^ for some Uctx G S}. 


Notice that for each c S {cam, CNonej CMatch}, the set D'^p^ is the set of documents 
selected by evaluating over W using every URI in S as context URI. In the follow¬ 
ing, we prove Lemma 4 for each of the three reachability criteria, cam, CNone, and CMatch ■ 


CAii-semantics: To prove Lemma 4 for cam we show that the set D'^'p is both a subset 
and a superset of the set of all (cam, 5, P)-reachable documents in W. 

We begin with the former. Hence, for an arbitrary document in D[p'p we have to 
show that this document is (cam, S', P)-reachable in W. Let diPE G D'^'p be such a 
document. Since c^lpe G ^lp'e’ know that there exist two URIs, Uctx and u, such that 

“ ttctx G S, 

- u G and 

- <^LPE = adoc(u). 


Then, either we have Wctx = u or itctx ^ u. In the following, we discuss these two cases. 

If Uctx = u, then cIlpe = adoc{uctx) and, thus, document cZlpe is (caii, S, P)-reach- 
able in W because it satisfies the first of the two alternative conditions for reachability 
as given in Section 4.3. 

If Uctx 7 ^ u, then, given that u G there exists a nonempty sequence of 

link graph edges 

{di,{ti,Ui),d'i) € Gw, {d2,{t2,U2),d'2) & Gw, , {dn,itn,Un),d'^) G Gw 

such that 

- di = adoc{uctx), 

- d[= di+i for all i G {1, ..., n — 1}, and 

- d'n= dLPE (and Un = u). 

Then, since di = adoc{ucxx) and Uctx G S, we have that document di is (caii, S, P)- 
reachable in W (the document satisfies the first of the two conditions for reachability 
as given in Section 4.3). As a consequence, we can use the fact that d' = di+i for all 
iG{l, ...,n — l}to show that all other documents connected by the sequence of link 
graph edges are also (cam, S', P)-reachable in W (they satisfy the second condition). 
Therefore, due to d'^ = (Ilpe, document d\_PE is (caii, S, P)-reachable in W. 

After showing that in both cases, it^x = u and Uctx 7 ^ u, document dLPE G D'^p 
is (cam, S, P)-reachable in W, we conclude that the set D'^'p is a subset of the set of all 
(cam, S, P)-reachable documents in W. It remains to show that D'^'p is also a superset. 

To this end, let dp be a document that is (caii, S, P)-reachable in W. We have to 
show that dp is in ^Lp^. We note that document dp may be (cam, S, P)-reachable in 
W because it satisfies either the first or the second of the two alternative conditions for 
reachability as given in Section 4.3. In the following, we discuss both cases. 

If dp satisfies the first condition, there exists a URI up G 5 such that adoc(Mp) = dp. 
Since is (_, _)*, we also have up G Therefore, we can use URI up 

as both Uctx and u in the definition of D'^p, which shows that dp G Pppp- 


If dp satisfies the second condition, then there exist both a seed URI uq G S and a 
nonempty sequence of link graph edges 

(dl , G Qw ^ (^2 5 (^2: t/ 2 ) ; ^ 2 ) € j • ■ • ; , (tn , ^n) ^ 

such that 

- di = adoc{uo), 

- d' = di+i for alH S {1, ..., n — 1}, and 

- d^ = dp and, thus, dp = adoc{un)- 

Moreover, every such link graph edge {dj , {tj,Uj),d'j) matches link pattern (_, _, _) 
in the context of URI Uj-i (1 < j < n). Therefore, since is (_, _, _ )*, we have 
Un G Then, with dp = adoc{un) and uq G S, we can use u„ as u and uq as 

Uctx in the definition of D^p^, which shows that dR G T^lp'e- 

In conclusion, independent of whether dR satisfies the first or the second condition 
for being (caii, S, P)-reachable in W, we find that dR G ^lp'e- Hence, d)[p‘p is not only 
a subset of all (cam, S, P)-reachable documents in W, but also a superset thereof, which 
shows that both sets are equivalent (as claimed in Lemma 4). 

CNone-semantics; To prove Lemma 4 for CNone we show that the set is both a 

subset and a superset of the set of all (ci\ione, S, P)-reachable documents in W. 

To begin with the former, assume an arbitrary document in dLPE G Pppp'. We have 
to show that this document is (cNone, *5, P)-reachable in W. Since dcpE G we 

know that there exist two URIs, Uctx and u, such that 

“ t/ctx G S, 

- uG and 

- dLPE = adoc(u). 

Given that is e, by it G \lpe^'^°"‘\'^ and Definition 5, we obtain that u = 

Uctx and, thus, dppE = adoc(uctx)- Therefore, document dLPE is (cNone, S, P)-reachable 
in W because it satisfies the first of the two alternative conditions for reachability as 
given in Section 4.3. As a consequence, we can conclude that the set P^pp' is a subset 
of the set of all (cMone, S, P)-reachable documents in W. 

To show that P^p™ is also a superset, let dp be an arbitrary document that is 
(ci\ione 7 S, P)-reachable in W. We have to show that dp is in P^p™. We note that dp can 
be (cNone, S, P)-reachable in W only if it satisfies the first of the two alternative condi¬ 
tions for reachability as given in Section 4.3 (for CNone, the second condition cannot be 
satisfied by any document because CNone(f, u, P) = false for all {t,u,P) gTxUx V). 
Therefore, given that dp satisfies the first condition, there exists a URI G S such 
that adoc{uR) = dp. Since is e, we also have ur G Therefore, 

we can use URI ur as both Uctx and u in the definition of P^pp' and, thus, obtain that 
dp G Pppp', which shows that the set Pppp" is a superset of the set of all (cMone, P)- 
reachable documents in W. Since we have shown before that Ppp™ is also a subset of 
the set of all documents that are (cMone, S, P)-reachable in W, we conclude that both 
sets are equivalent. Hence, Lemma 4 holds for reachability criterion ci\ione- 


CMatch-semantics: It remains to prove Lemma 4 for CMatch- To this end, we show 
that the set is both a subset and a superset of the set of all documents that are 

(cMatch, P)-reachable in W. As before, we begin with the former. 

Let (Ilpe be an arbitrary document in We have to show that this document 

is (cMatch, *5', T’)-reachable in W. Since ^lpe & we know by the dehnition of 

f?Lp|‘'' (as given in Lemma 4) that there exist two URIs, u^tx and u, such that 

“ tActx ^ 

- u e and 

- c^LPE = adoc(u). 

Given that u G there exists a nonempty sequence of LfRIs uq, ui, ..., 

and a corresponding sequence of documents dQ,di, ..., such that 

- di = adoc{ui) for each i G {0, ..., n}, 

- Uo= Uctx, 

- Un = u (and, thus, dn = ^lpe), and 

- for each i G {1, ..., n}, there exists a triple pattern tpk in P (1 < k < m) such 

that Ui G |{?v, where Iv G {?s, 7p, To} and qk is the LDQL query that 

corresponds to tpk as specihed in the dehnition of lpe^^‘''*' above. 

We show by induction over n that all n+1 documents, do, di, ..., dn, are (cMatch, S, P)- 
reachable in W, and, thus, so is dppE = dn- 

Base case (n = 0): do is (cMatch, <5, P)-reachable in W because do = adoc{uo), 
uq = Uctx, and u^tx G S'; i.e., do satishes the hrst condition as specihed in Section 4.3. 

Induction step(n > 0): By induction, we assume that document d„-i is (cMatch, S, P)- 
reachable in W. To show that dn is also (cMatch, S, P)-reachable in W we aim to show 
that dn satishes the second condition for reachability as given in Section 4.3. That 
is, we aim to show that there exists a link graph edge (dsrc, {t,u),dtgt) G ffw such 
that (i) dsrc is (cMatch, S, P)-reachable in W, (ii) CMatch(f, u, P) = true, (iii) u = Un, 
and (iv) dtgt = d„. Let dsrc be d„.i, which is (cMatch, S, P)-reachable in W by our 
inductive hypothesis. Hence, it remains to show the existence of a link graph edge 
{dn-i, {t,Un),dn) G Gw for which CMatch(f, P) = true. To this end, we use the 
fourth of the four aforementioned properties of the sequence of URIs Uq, iti, ... ,Un- 
Let fpfc = {sk,Pk, Ok) be a triple pattern in P such that Un G [(7?^, qk)lw^^ where 
7v G {?s, ?p, ?o} and q^ is the LDQL query that corresponds to tpk as specihed in the 
dehnition of above; i.e., qk is a basic LDQL query of the form (e, Pk) where 

SPARQL pattern Pk contains the triple pattern {7s, 7p, 7o) and (i) if Sk ^ V, then Pk 
contains filter ?s = Sk, (ii) if Pk ^ V, then Pk contains filter ?p = pk, and (iii) if 
Ok ^ V, then Pk contains filter ?o = Ok- 

Since Un G |(?u, qk)1-^~^, by Dehnition 5, there exists a solution mapping p such 
that p G Iqklw”'"'^ and p{7v) = Un- Moreover, since qk is of the form {£,Pk), we 
have p G |Pfc]§ where D = datasetw ({wn-i}) with default graph G = data(d„_i). 
Then, due to the construction of Pk, it is easily verihed that there exists an RDF 
triple t G data(d„_i) such that p[tpk] = t and Un G uris(f). As a consequence, 

(i) CMatch{t,Un,tpk) = true and (ii) by Dehnition 2, there exists a link graph edge 


{dn-i,{t,Un),dn) S Gw■ Finally, since tpk is a triple pattern in P, we also have 
CMatch(t,Un,P) = true. 

While this concludes showing that the set is a subset of the set of all documents 

that are (cMatch i S, P)-reachable in W, we now show that it is also a superset thereof. 

Let dp be a document that is (cMatch: S, P)-reachable in W. We have to show that d^ 
is in . Since cZr is (cMatch, S, P)-reachable in W, there exist a nonempty sequence 

of URIs ..., Un, a corresponding sequence of documents 

do = adoc{uo)^ d\ = adoc(ui), d2 = adoc(u2), ..., dn = adoc[un), 

and a corresponding sequence of link graph edges 

(di, (fi,'Ui),di) € Gw ? (^2 ’ (^2; t/2); d2 ) € Gw ; • ■ • ; (^n 5 ^ tin); dn) € Gw 

such that 


- Uq € S, 

- CMatch (L, Mi,-P) = true for all i G {1, ... ,n}, 

- d' = di_i for all i G {1, ..., n}, and 

- d„ = dp and, thus, dp = adociun)- 

We aim to show that each of the n + 1 documents, dp, di, d 2 , ..., dn, is in Plp|‘\ 
and, thus, so is dp = dn - To this end, it is sufficient to show that each of the n +1 URIs, 
Uq, Ui, ..., u„, is in Then, with uq G S, for each i G {0, ..., n} we can 

use URI Ui as u and uq as itctx in the definition of Ppp^''', which shows that document 
di = adoc{ui) is in Ppp^'L We use proof by induction. 

Base case (n = 0): Since is of the form (•)*, we have uq G 

Induction step (n > 0): By induction, we assume that it„_i G Then, 

to show that u„ is also in it is sufficient to show that u„ is in 

where is {Is, qi) \ {Ip, qi) \ {lo, gi) | ... | {Is, qm) \ (?P, qm) \ (?o, qm) such that 

IpgCMMt, _ {ipe'^^^p^y. Due to the existence of link graph edge (d^, {tn,Un),dn) G Gw 
with d'n = dn-i, we know by Definition 2 that there exists a triple G data(dn-i) 
with Un G uris(fn). Moreover, since CMatch {tn, Un, P) = true, there exist both a triple 
pattern tpk in P and a solution mapping p such that p\tpk\ = tn- Then, given the 
LDQL query qk = {s,Pk) that is constructed for tpk as specified in the definition of 
^pgCMatch, jj jQ verify that there exists a solution mapping p' such that p' G |Pfe]§ 
and p'{Iv) = Un, where Iv G {?s, Ip, lo} and D = dataset^ ({mb-i}) with default 
graph G = data(d„_i). Then, by Definition 5, we also have p' G \qk\w^~^'^ and, 
thus, Un G [(?M, (7fc)|5^“\ Since {lv,qk) is a disjunct in we also obtain 

Un e llpe^tep''lw~' and, thus, G 

As argued before, as a consequence of it„ G (and uq G S), we can 

show that document dR = adoc{un) is in P^pp''' by using it„ as u and ug as Uctx in the 
definition of (cf. Lemma 4). Therefore, the set Pppp^'' is not only a subset of the 

set of all documents that are (cMatch , S, P)-reachable in W (as shown before), but also 
a superset. Hence, both sets are equivalent and, thus. Lemma 4 holds for CMatch. 


A.9 Proof of Theorem 6 

In the proof we use the following simple LDQL query Q{lx) given by 




We prove first that the reachability criterion C|\ione cannot express Q{'!x). On the 
contrary, assume that there exists a SPARQL pattern P such that 


for every S and W. Let u, u',a,b be different elements in U that are not mentioned in P. 
Consider now Wi having only two documents di = {{u,p, it')} and d2 = {(a, a, a)} 
and such that adoc{u) = di and adoc{u') = c? 2 . Moreover, consider 11^2 having also 
two documents di = {(it,p,it')} and d^ = {(6,6,6)} such that adoc{u) = di and 
adoc{u') = dg. First notice that 

It is easy to see that Just notice that from {it}, the 

set of reachable documents following the C|\ione criterium is the same set {di} in both 
Wi and W 2 . Thus we have that but lQi7x)jP^ ^ 

|Q(?a;)j|{)J which is a contradiction. 

To continue with the proof, we now show that the reachability criterion caii cannot 
express Q{lx). To obtain a contradiction, assume that there exists a pattern P such that 

|p]^c„„S) ^ |g(73:)l^ 


for every S and W. Let it, u',a,b be different elements in lA that are not mentioned in P. 
Consider now ILi = ({di, d 2 , da}, adoci) having three documents di = {(it,p, it')}, 
d 2 = {(a, a, a)} and da = {(6,6,6)} and such that adoci{u) = di, adoci{u') = d 2 
andadoci(a) = da. Moreover, consider 14^2 = ({di, d 2 , da}, adoc 2 ) having exactly the 
same documents as Wi, and such that adoc 2 (u) = di, adoc 2 {u') = da and adoc 2 {b) = 
da. First notice that 

[g(?x)lW={{?x^a}} [g(?x)lW = {{?x^6}}. 


Now notice that from {it}, the set of reachable documents in ILi following the caii 
criterium is the set {di,d 2 ,d 3 }; di is the document associated to it, da is reachable 
from di via the URI it', and da is reachable from da via the URI a. Moreover, the set 
reachable documents from {it} in Wa is also {di, da, da}; di is the document associated 
to It, da is reachable from di via the URI it', and da is reachable from da via URI 6. 


Given that the set of reachable documents is the same in both Wi and Wa we have 
^ Given that lQi7x)j\^^ ^ we obtain our 


desired contradiction. 

We consider now the case of CMatch, and prove that it cannot express Q{7x). To 
obtain a contradiction, assume that there exists a pattern P such that 


|p]^CM«ch.S) 


|g(?x)]^ 


for every S and W. Let u, u', u", a be different elements in U that are not mentioned 
in P. Consider now Wi having two documents di = {(u,p, u')} and d 2 = {(a, a, a)} 
and such that adoc{u) = d\ and adoc{u') = d 2 - Moreover, consider W 2 having also 
two documents d[ = {(u",p,u')} and = {(a, a, a)} such that adoc{u) = d'^ and 
adoc{u') = d' 2 . First notice that 

[g(?x)lW={{?x^a}} ^ [g(?x)lW=0. 

We prove now that Now given that di is the docu¬ 

ment associated to u in Wi, we have that di is (cMatch; {"«}, P)-reachable in Wi. Simi¬ 
larly, we know that d'l is (cMatch, {w}, P)-reachable in 1^2- Moreover, given that P does 
not mention u, v! and u" we have that {u,p,u') matches a triple pattern in P if and only 
if {u",p,u') matches a triple pattern in P. Thus we have that d 2 is (cMatch, {u},P)- 
reachable in Wi if and only if ^2 is (cMatch, {m}, P)-reachable in 1^2- Thus we have 
only two cases, either 

- {di} is the set of (cMatch, {it}, P)-reachable documents in Wi, and {d^} is the set 
of (cMatch, {m}, P)-reachable documents in W 2 , or 

- {di, d 2 } is the set of (cMatch, {it}, P)-reachable documents in Wi, and {d'j^, d^} is 
the set of (cMatch, {u}, P)-reachable documents in W 2 . 

In the first case we have that is obtained by evaluating P over graph 

Gi = {{u,p,u')}, and that is obtained by evaluating P over graph 

G 2 = {{u",p,u')}. Given that P does not mention u, u' and u” we obtain that the 
evaluation of P over Gi is the same as the evaluation of P over G 2 which implies 
that In the second case, is obtained 

by evaluating P over graph Gi = {{u,p,u'),{a,a,a)}, and is ob¬ 

tained by evaluating P over graph G 2 = {{u",p, u'), {a, a, a)}. For the same reason 
as above we have that the evaluation of P is the same over Gi and over G 2 which 
implies that We have proved that = 

while |g(?a:)]|^j^ ^ which is our desired contradiction. 

A.IO Proof of Proposition 2 

Property 1 Let q be an arbitrary basic LDQL query of the form {lpe,P) such that 
Ipe is Web-safe. To show that q is Web-safe we provide Algorithm 1. In line 3 the 
algorithm calls a subroutine, ExecLPE, that evaluates a given LPE in the context of 
a given URI (cf. Algorithm 2). The correctness of the algorithm and its subroutine is 
easily checked. Moreover, a trivial proof by induction on the possible structure of LPEs 
can show that for any Web-safe LPE, the given subroutine looks up a finite number of 
URIs only. The crux of such a proof is twofold: Eirst, the evaluation of LPEs of the form 
Ipe* (lines 26 to 34 in Algorithm 2) is guaranteed to reach a fixed point for my finite 
Web of Linked Data. Second, the evaluation of LPEs of the form (?u, q) (lines 38 to 
42) uses an algorithm for subquery q that has the properties as required in Definition 6. 
Due to the Web-safeness of the given LPE and, thus, of q, such an algorithm exists. 


Algorithm 1 Execution of a basic LDQL query {lpe,P) using a set S of URIs as seed. 
1: <?> := a new empty set of URIs 
2: for all M € S' do 
3: <P :=<P[JEXECLFE{lpe,u) 

4: end for 

5: G := a new empty set of RDF triples (i.e., an empty RDF graph) 

6: Af := a new empty set of pairs consisting of a URI and an RDF graph 

7: for aWu £ $ do 

8: if looking up URI u results in retrieving a document, say d then 

9: G := G U data(d) 

10: Af := AfU {(u, data(d))} 

11; end if 
12; end for 

13; return //can be computed by using any algorithm that implements 

// the standard (set-based) SPARQL evaluation function [2] 


Property 2 First, let q be an LDQL query of the form t^vq' such that subquery q' is 
Web-safe. Due to the Web-safeness of q', there exists an algorithm for q' that has the 
properties as required in Definition 6. We may use this algorithm to construct an algo¬ 
rithm for g; that is, our algorithm for q calls the algorithm for q', applies the projection 
operator to the result, and returns the set of solution mappings resulting from this pro¬ 
jection. Since the application of the projection operator does not involve URI lookups, 
the constructed algorithm for q has the properties as required in Definition 6. Second, 
let q be an LDQL query of the form (seed U q') such that q' is Web-safe. Hence, there 
exists an algorithm for q' that has the properties as required in Definition 6. Then, show¬ 
ing the Web-safeness of q is trivial because the algorithm for q' can also be used for q. 

Property 3 Let q be an LDQL query of the form (gi union ... union g„) such that 
each subquery qi (l<i<ri) is Web-safe. Hence, for each subquery there exists an 
algorithm that has the properties as required in Definition 6. Then, the Web-safeness of 
query g is easily shown by specifying another algorithm that calls the algorithms of the 
subqueries sequentially and unions their results. 

A.ll Proof of Lemma 3 

Lemma 3 follows from Definition 7 and Buil-Aranda et al.’s result [4, Proposition 1]. 

A.12 Proof of Theorem 7 

We prove Theorem 7 based on Algorithm 3, which is an iterative algorithm that gener¬ 
alizes the execution strategy outlined for query q'J^ in Example 6. That is, the algorithm 
executes the subqueries gi, g 2 , ■■■ ,qm sequentially in the order -< such that each itera¬ 
tion step (lines 2 to 24) executes one of the subqueries by using the solution mappings 
computed during the previous step (which are passed on via the sets f2o) ) f^m)- 





Algorithm 2 ExECLPE(/pe, it^tx) 

1: if looking up URI u^tx results in retrieving a document, say d^x then 
2: if Ipe is e then 

3: return a new singleton set {wctx} 

4: else if Ipe is a link pattern Ip = (yi, t/2, i/s) then 

5: Ip' := (j/i, J/2i J/s). where {y'i,y' 2 , y's) is a link pattern generated from Ip such that any 

occurrence of symbol + in Ip is replaced by URI Uctx 
6: <? := a new empty set of URIs 

7: for all (xi,X2,X3) € data(dctx) do 

8: if (y'l = xi or yj = _) and {y '2 = *2 or y '2 = _) and (y '3 = xz or y '3 = _) then 

9: foralH e {l,2,3}do 

10: if J/i = _ is ^ URI whose lookup retrieves a document then 

11 : $ :=$U {xi} 

12: end if 

13: end for 

14: end if 

15: end for 

16: return <P 

17: else if Ipe is of the form Ipe 1 /Ipe 2 then 

18: (?':=EXECLPE(lpei,Uctx) 

19: ^ := a new empty set of URIs 

20: for all u' Ao$ ■.= $yj ExEcLPEf/pcj, m') end for 

21: return 

22: else if Ipe is of the form lpe^\ lpe 2 then 

23: := EXECLPEflpeijMctx) 

24: $2 := EXECLPEflpeg, Mctx) 

25: return 1 U <1’2 

26: else if Ipe is of the form Z* then 

27: ^cur := EXECLPE(e,Uctx) 

28: lpe'\=l 

29: repeat 

30: -^prev := 

31: $cur ■■= ^cur U EXECLPEfZpe', Uctx) 

32: Ipe' := an LPE of the form Ipe'/I 

33: until $cm = ^prev 

34: return tfcur 

35: else if Ipe is of the form [Zpe'j then 

36: $ := EXECLPE(Zpe', Uax) 

37: if 7^ 0 then return a new singleton set {uctx} else return a new empty set end if 

38: else if Ipe is of the form (?u, q) then 

39: Q := EXEC(g, {«ctx}) // where Exec denotes an arbitrary algorithm that can be used 

// to compute the {uctx}-based evaluation of q over the queried 
// Web of Linked Data 
40: ^ := a new empty set of URIs 

41: for all p € f? for which ?n £ dom(p) and p(?w) £ U Ao $ := VJ {p(?n)} end for 

42: return $ 

43: end if 

44: else 

45: return a new empty set 

46: end if 













Algorithm 3 Execution of an LDQL query q of the form [qi and q 2 and ... and qm) 
using a finite set S of URIs as seed. 

Require: m > 1 

Require: LDQL query q is given as an array Q consisting of all subqueries of q such that 
the order of the subqueries in this array satisfies the conditions as given in Theorem 7. 

1: f2o := {/tb}, where /rg is the empty solution mapping; i.e., dom(/r0) = 0 
2: for j := 1, ..., m do 

3: f2ttrip := a new empty set of solution mappings 

4: qj := the y-th subquery in array Q 

5: if qj is of the form (SEED 7v q') then 

6: [/tmp := a new empty set of URIs 

7: for all n € do 

8: if /r(?'u) is a URI then [/tmp := [/tmp U {/r(?'!;)} end if 

9: end for 

10: for all u € Utmp do 

11: f2tmp := f2tmp U Exec((;',{u}) // where Exec denotes an arbitrary algorithm that 

// can be used to compute the {M}-based evaluation 
// of q' over the queried Web of Linked Data 

12: end for 

13: else 

14: f2tmp := EXEC(qj, S) 11 where Exec denotes an arbitrary algorithm that can 

// be used to compute the 5'-based evaluation of qj 
11 over the queried Web of Linked Data 

15: end if 

16: Qj := a new empty set of solution mappings 

17: for all /r e Qj-i do 

18: for all /r' e Qtmp do 

19: if n and fi' are compatible then 

20: Qj := Qj U {/rjoin}, where /rjoin — fiU fi' 

21: end if 

22: end for 

23: end for 

24: end for 
25: return Qm 


To prove that Algorithm 3 has the properties as required in Definition 6 we have to 
show that the algorithm is sound and complete (i.e., for any finite set S of URIs and 
any Web of Linked Data W, the algorithm returns and that it is guaranteed to 

look up a finite number of URIs only. We show these properties by induction on the m 
iteration steps performed by the algorithm. To this end, we assume that the indices as 
used for the subqueries qi,q 2 , ■■■ ,qm reflect the order that is, subquery qi is the first 
according to subquery q 2 is the second, and so on. 

Base Case (m = 1): By the conditions in Theorem 7, the first subquery (according 
to -<) must be Web-safe and, thus, cannot be of the form (seed 7v q'). Hence, the 








algorithm enters the corresponding else-branch (line 14). Due to the Web-safeness of 
qi, there exists an algorithm for subquery qi, say Ai, that has the properties as required 
in Definition 6 . Algorithm 3 uses algorithm Ai to obtain i7tmp = [ 9 il^ (where W is 
the queried Web of Linked Data), which requires only a finite number of URI lookups. 
Thereafter, Algorithm 3 computes l7i = l7o n f^tmp (lines 16 to 23) and returns 
f?i (line 25), which does not require any more URI lookups. Hence, for m = 1, the 
algorithm looks up a finite number of URIs (if the queried Web of Linked Data is finite). 
Since l7o contains only the empty solution mapping /i 0 (line 1), which is compatible 
with any other solution mapping, we have l7i = l7tmp and, thus, J7i = 

Induction Step (to > 1): By induction we assume that after completing the (to- 1)- 
th iteration, the algorithm has looked up a finite number of URIs only and the current 
intermediate result 17^-1 covers the conjunction of subqueries qi,q 2 , ■■■, qm-i \ that is, 
Orn-i = K?! and ^2 AND ... AND We show that the TO-th iteration also looks 

up a finite number of URIs only and that = |(<?i and q 2 and ... and qra)\w- 

If subquery qm is Web-safe, it is not difficult to see these properties: Since q^ is 
Web-safe, there exists an algorithm for qm, say Am, that has the properties as required 
in Definition 6 . The corresponding call of algorithm Am in line 14 of Algorithm 3 
looks up a finite number of URIs only, and the subsequent join computation in lines 
16 to 23 does not require any more lookups. Moreover, the result of calling algorithm 
Am in line 14 is J7tmp = I'Zm]^ and, since the subsequent join computation returns 
f2m = l7m-i N f?tmp, we have l7m = K^i AND 92 and ... AND( 7 m)]^, as desired. 

It remains to discuss the case of subquery qm being of the form (seed Iv q'), 
where, by the conditions in Theorem 7, subquery q' is Web-safe. Hence, there ex¬ 
ists an algorithm for q', say A', that has the properties as required in Definition 6. 
In this case. Algorithm 3 first iterates over all solution mappings in i7m-i to popu¬ 
late a set C/tmp with all URIs that any of these mappings binds to variable ?v (lines 6 
to 9). Due to the finiteness assumed for all queried Webs of Linked Data (cf. Def¬ 
inition 6), Qm-i is finite. Hence, the resulting set C/tmp contains a hnite number of 
URIs. Therefore, the subsequent loop in lines 10 to 12 calls algorithm A' a finite 
number of times and, thus, the TO-th iteration looks up a hnite number of URIs only. 
To show the remaining claim, flm = [(71 and 92 and ... AND9m)l^, we hrst show 
C |( 9 t AND 92 AND ... AND9m)|^. Let pjoin be an arbitrary solution mapping in 
Qm', i-e., Pjoin € ^m- By lines 17 to 23, there exist solution mappings /r and fj! such that 

(i) p € Qm — l — 1(71 AND 92 AND ... AND 9 ^ — 1 )]^, (ii) /T € Qtmp — 

(iii) p and /i' are compatible, and (iv) /ijoin = /i U p'. Then, by Dehnition 5, we have 
Mjoin e [(71 and 92 AND ... AND 9 „)]|, and, thus, Qm C 1(91 AND 92 AND ... AND 9 ™)]|,. 
Finally, we show Qm 2 [(91 and 92 and ... AND 9 m)l^. Assume an arbitrary solu¬ 
tion mapping p* £ [(91 and 92 and ... ANDg^)]^. Then, by Dehnition 5, there exist 
two solution mappings /ij and ^2 such that (i) /i^ £ |( 9 i and 92 and ... and 9 ^- 1 )]^, 

(ii) P 2 G Iqmjw’ (iii) p* and ure compatible, and (iv) p* = pj; U p^. By our in¬ 
duction hypothesis, we have p* £ Qm-i- Then, given lines 17 to 23, we have to show 
that P 2 £ f 2 tmp where f 2 tmp is the set of solution mappings computed during the m-th 
iteration. Since qm is of the form (seed 7v q'), it holds that l7tmp = Uuet/tm 
where C/tmp = {u £U \ p(?u) = m for some p £ [(91 and 92 and ... and 9 ^- 1 )!^}. 
Hence, to show that £ f^tmp we show that there exists a URI u £ C/tmp such that p 2 


is in Since G by Definition 5, solution mapping binds variable 

1v to a URI, say u*; i.e., 7v G dom(/r 2 ) and = u* with u* G U. Furthermore, 

by Lemma 3 and the condition in Theorem 7 (i.e., ?u G sbvars(( 3 'fc)), solution 

mapping p* also has a binding for variable 7v, and, since and p 2 compatible, 
these bindings are the same, that is, Hence, for URI u* = it 

holds that u* G C/tmp. Then, by Definition 5, we obtain that G which shows 

that ^2 £ l^tmp ™d, thus, we can conclude that 17^ 3 and q2 and ... and qm)}w- 

A.13 Proof of Corollary 1 

Corollary 1 is an immediate consequence of Lemma 2. 


